From dlong at openjdk.org Sat Mar 1 02:22:32 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 1 Mar 2025 02:22:32 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v4] In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. Dean Long has updated the pull request incrementally with one additional commit since the last revision: use new Bytecode_invoke::has_memeber_arg ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23557/files - new: https://git.openjdk.org/jdk/pull/23557/files/ebf10dae..375f6cfe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=02-03 Stats: 13 lines in 4 files changed: 9 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23557.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23557/head:pull/23557 PR: https://git.openjdk.org/jdk/pull/23557 From dlong at openjdk.org Sat Mar 1 02:22:32 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 1 Mar 2025 02:22:32 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 19 Feb 2025 00:37:14 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Stricter assertion on ppc64 Thanks Patricio and Richard for the reviews. New commit pushed that adds Bytecode_invoke::has_memeber_arg as suggested by Richard. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2691853937 From hgreule at openjdk.org Sat Mar 1 13:46:30 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sat, 1 Mar 2025 13:46:30 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations Message-ID: subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway - For other values, the msb is flipped, flipping twice results in the original value again. Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like int v = 0; for (int datum : data) { v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); } return v; was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. ------------- Commit messages: - collapse impl, add more fitting nodes - test Changes: https://git.openjdk.org/jdk/pull/23851/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23851&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350988 Stats: 204 lines in 4 files changed: 179 ins; 8 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/23851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23851/head:pull/23851 PR: https://git.openjdk.org/jdk/pull/23851 From hgreule at openjdk.org Sat Mar 1 13:46:30 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sat, 1 Mar 2025 13:46:30 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: <8RU1UJbYq5dsJWwpIF_NOyPTpDNSDyUYAo06uApOHh4=.7744316c-fc1f-4785-a04b-cd4d59b01915@github.com> On Sat, 1 Mar 2025 13:34:30 GMT, Hannes Greule wrote: > subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. > > For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: > - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway > - For other values, the msb is flipped, flipping twice results in the original value again. > > Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. > > One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like > > int v = 0; > for (int datum : data) { > v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); > } > return v; > > was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. test/hotspot/jtreg/compiler/c2/irTests/InvolutionIdentityTests.java line 118: > 116: > 117: @Test > 118: @IR(failOn = {IRNode.REVERSE_BYTES_I}) I'm not sure if this is fine as the ReverseBytes nodes depend on intrinsics. From my understanding, the methods are just seen as normal methods on platforms without reverseBytes support. In that case, the test would still pass, but it might be surprising that it passes. Is this fine or is there a better way here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23851#discussion_r1976413355 From kvn at openjdk.org Sat Mar 1 19:57:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 1 Mar 2025 19:57:53 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 18:00:37 GMT, Andrew Dinn wrote: > The two special case opto runtime blobs that support uncommon trap and exception handling are currently being generated using hard wired blob names determined by port-specific code. They should employ the standard blob names generated from shared declarations in file stubDeclarations.hpp. @adinn I am printing assembler code (`-XX:CompileCommand=print,Test::test) on Mac M1 for nmethod and see this with latest JDK: 0x000000010ca8c468: movz x8, #0x6d00 ; {runtime_call StubRoutines (finalstubs)} 0x000000010ca8c46c: movk x8, #0xca0, lsl #16 0x000000010ca8c470: movk x8, #0x1, lsl #32 0x000000010ca8c474: blr x8 before (jdk 23) it was: 0x000000011115fe2c: movz x8, #0xf340 ; {runtime_call Stub::nmethod_entry_barrier} 0x000000011115fe30: movk x8, #0x1106, lsl #16 0x000000011115fe34: movk x8, #0x1, lsl #32 0x000000011115fe38: blr x8 It could be something changed in code generation but **StubRoutines (finalstubs)** is not helpful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23829#issuecomment-2692387726 From rrich at openjdk.org Sat Mar 1 22:23:56 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Sat, 1 Mar 2025 22:23:56 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v4] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Sat, 1 Mar 2025 02:22:32 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > use new Bytecode_invoke::has_memeber_arg Marked as reviewed by rrich (Reviewer). src/hotspot/share/runtime/vframeArray.cpp line 616: > 614: // invokedynamic instructions don't have a class but obviously don't have a MemberName appendix. > 615: // NOTE: Use machinery here that avoids resolving of any kind. > 616: const bool has_member_arg = inv.has_member_arg(); I reckon the comment about invokedynamic isn't needed anymore. It could be moved to has_member_arg if you want to keep it. ------------- PR Review: https://git.openjdk.org/jdk/pull/23557#pullrequestreview-2652589555 PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1976500470 From kbarrett at openjdk.org Sun Mar 2 23:39:04 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 2 Mar 2025 23:39:04 GMT Subject: RFR: 8345492: Fix -Wzero-as-null-pointer-constant warnings in adlc code In-Reply-To: References: <3SzBZUBz0SaRp9F7y0BX7WMqm_MkuHodLh8erPRYuWk=.a47cfc08-1c25-4b61-b7d2-3dd840e3b488@github.com> Message-ID: On Wed, 26 Feb 2025 16:43:14 GMT, Vladimir Kozlov wrote: >> Please review this trivial change to adlc, to use nullptr instead of literal 0 >> as a null pointer constant. >> >> Testing: mach5 tier1 >> Locally tested (linux-x64) with -Wzero-as-null-pointer-constant enabled to >> verify the warnings associated with this code were removed. > > Good. Thanks for reviews @vnkozlov and @dean-long ------------- PR Comment: https://git.openjdk.org/jdk/pull/23804#issuecomment-2692968875 From kbarrett at openjdk.org Sun Mar 2 23:39:05 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 2 Mar 2025 23:39:05 GMT Subject: Integrated: 8345492: Fix -Wzero-as-null-pointer-constant warnings in adlc code In-Reply-To: <3SzBZUBz0SaRp9F7y0BX7WMqm_MkuHodLh8erPRYuWk=.a47cfc08-1c25-4b61-b7d2-3dd840e3b488@github.com> References: <3SzBZUBz0SaRp9F7y0BX7WMqm_MkuHodLh8erPRYuWk=.a47cfc08-1c25-4b61-b7d2-3dd840e3b488@github.com> Message-ID: On Wed, 26 Feb 2025 15:11:25 GMT, Kim Barrett wrote: > Please review this trivial change to adlc, to use nullptr instead of literal 0 > as a null pointer constant. > > Testing: mach5 tier1 > Locally tested (linux-x64) with -Wzero-as-null-pointer-constant enabled to > verify the warnings associated with this code were removed. This pull request has now been integrated. Changeset: 0a1eea11 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/0a1eea112d9f709bac32908f216b8598e918ed33 Stats: 9 lines in 2 files changed: 0 ins; 0 del; 9 mod 8345492: Fix -Wzero-as-null-pointer-constant warnings in adlc code Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/23804 From xgong at openjdk.org Mon Mar 3 01:47:04 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Mar 2025 01:47:04 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 01:18:57 GMT, Xiaohong Gong wrote: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... Hi, could anyone please help take a look at this PR? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2693067982 From xgong at openjdk.org Mon Mar 3 02:26:03 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Mar 2025 02:26:03 GMT Subject: RFR: 8350748: VectorAPI: Method "checkMaskFromIndexSize" should be force inlined In-Reply-To: References: <18Q2Zl2ip_eFS_Y4fflgS8XYBkbwCZ468DIjP3KwhDE=.240f4182-4b02-4fac-97c8-ac659427e4a8@github.com> Message-ID: <5MHf8-aP7o8WftHBIn6o-t2ZuMJfSLrhxnTvi2qGBq8=.7e9c5210-ed9f-4ab1-8d58-80e175ff019e@github.com> On Thu, 27 Feb 2025 23:30:29 GMT, Paul Sandoz wrote: >> Method `checkMaskFromIndexSize` is called by some vector masked APIs like `fromArray/intoArray/fromMemorySegment/...`. It is used to check whether the index of any active lanes in a mask will reach out of the boundary of the given Array/MemorySegment. This function should be force inlined, or a VectorMask object is generated once the function call is not inlined by C2 compiler, which affects the API performance a lot. >> >> This patch changed to call the `VectorMask.checkFromIndexSize` method directly inside of these APIs instead of `checkMaskFromIndexSize`. Since it has added the `@ForceInline` annotation already, it will be inlined and intrinsified by C2. And then the expected vector instructions can be generated. With this change, the unused `checkMaskFromIndexSize` can be removed. >> >> Performance of some JMH benchmarks can improve up to 14x on a NVIDIA Grace CPU (AArch64 SVE2, 128-bit vectors). We can also observe the similar performance improvement on a Intel CPU which supports AVX512. >> >> Following is the performance data on Grace: >> >> >> Benchmark Mode Cnt Units Before After Gain >> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE thrpt 30 ops/ms 31544.304 31610.598 1.002 >> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE thrpt 30 ops/ms 3896.202 3903.249 1.001 >> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE thrpt 30 ops/ms 570.415 7174.320 12.57 >> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE thrpt 30 ops/ms 566.694 7193.520 12.69 >> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE thrpt 30 ops/ms 3899.269 3878.258 0.994 >> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE thrpt 30 ops/ms 1134.301 16053.847 14.15 >> StoreMaskedIOOBEBenchmark.byteStoreArrayMaskIOOBE thrpt 30 ops/ms 26449.558 28699.480 1.085 >> StoreMaskedIOOBEBenchmark.doubleStoreArrayMaskIOOBE thrpt 30 ops/ms 1922.167 5781.077 3.007 >> StoreMaskedIOOBEBenchmark.floatStoreArrayMaskIOOBE thrpt 30 ops/ms 3784.190 11789.276 3.115 >> StoreMaskedIOOBEBenchmark.intStoreArrayMaskIOOBE thrpt 30 ops/ms 3694.082 15633.547 4.232 >> StoreMaskedIOOBEBenchmark.longStoreArrayMaskIOOBE thrpt 30 ops/ms 1966.956 6049.790 3.075 >> StoreMaskedIOOBEBenchmark.shortStoreArrayMaskIOOBE thrpt 30 ops/ms 7647.309 27412.387 3.584 > > Marked as reviewed by psandoz (Reviewer). Thanks a lot for your review @PaulSandoz ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23817#issuecomment-2693122599 From xgong at openjdk.org Mon Mar 3 02:26:04 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Mar 2025 02:26:04 GMT Subject: Integrated: 8350748: VectorAPI: Method "checkMaskFromIndexSize" should be force inlined In-Reply-To: <18Q2Zl2ip_eFS_Y4fflgS8XYBkbwCZ468DIjP3KwhDE=.240f4182-4b02-4fac-97c8-ac659427e4a8@github.com> References: <18Q2Zl2ip_eFS_Y4fflgS8XYBkbwCZ468DIjP3KwhDE=.240f4182-4b02-4fac-97c8-ac659427e4a8@github.com> Message-ID: On Thu, 27 Feb 2025 06:43:19 GMT, Xiaohong Gong wrote: > Method `checkMaskFromIndexSize` is called by some vector masked APIs like `fromArray/intoArray/fromMemorySegment/...`. It is used to check whether the index of any active lanes in a mask will reach out of the boundary of the given Array/MemorySegment. This function should be force inlined, or a VectorMask object is generated once the function call is not inlined by C2 compiler, which affects the API performance a lot. > > This patch changed to call the `VectorMask.checkFromIndexSize` method directly inside of these APIs instead of `checkMaskFromIndexSize`. Since it has added the `@ForceInline` annotation already, it will be inlined and intrinsified by C2. And then the expected vector instructions can be generated. With this change, the unused `checkMaskFromIndexSize` can be removed. > > Performance of some JMH benchmarks can improve up to 14x on a NVIDIA Grace CPU (AArch64 SVE2, 128-bit vectors). We can also observe the similar performance improvement on a Intel CPU which supports AVX512. > > Following is the performance data on Grace: > > > Benchmark Mode Cnt Units Before After Gain > LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE thrpt 30 ops/ms 31544.304 31610.598 1.002 > LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE thrpt 30 ops/ms 3896.202 3903.249 1.001 > LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE thrpt 30 ops/ms 570.415 7174.320 12.57 > LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE thrpt 30 ops/ms 566.694 7193.520 12.69 > LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE thrpt 30 ops/ms 3899.269 3878.258 0.994 > LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE thrpt 30 ops/ms 1134.301 16053.847 14.15 > StoreMaskedIOOBEBenchmark.byteStoreArrayMaskIOOBE thrpt 30 ops/ms 26449.558 28699.480 1.085 > StoreMaskedIOOBEBenchmark.doubleStoreArrayMaskIOOBE thrpt 30 ops/ms 1922.167 5781.077 3.007 > StoreMaskedIOOBEBenchmark.floatStoreArrayMaskIOOBE thrpt 30 ops/ms 3784.190 11789.276 3.115 > StoreMaskedIOOBEBenchmark.intStoreArrayMaskIOOBE thrpt 30 ops/ms 3694.082 15633.547 4.232 > StoreMaskedIOOBEBenchmark.longStoreArrayMaskIOOBE thrpt 30 ops/ms 1966.956 6049.790 3.075 > StoreMaskedIOOBEBenchmark.shortStoreArrayMaskIOOBE thrpt 30 ops/ms 7647.309 27412.387 3.584 This pull request has now been integrated. Changeset: d48ddfe4 Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/d48ddfe49a4e0b07949912d3c91d6f4737658b3e Stats: 213 lines in 7 files changed: 36 ins; 140 del; 37 mod 8350748: VectorAPI: Method "checkMaskFromIndexSize" should be force inlined Reviewed-by: psandoz ------------- PR: https://git.openjdk.org/jdk/pull/23817 From liach at openjdk.org Mon Mar 3 03:29:59 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 3 Mar 2025 03:29:59 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations In-Reply-To: <8RU1UJbYq5dsJWwpIF_NOyPTpDNSDyUYAo06uApOHh4=.7744316c-fc1f-4785-a04b-cd4d59b01915@github.com> References: <8RU1UJbYq5dsJWwpIF_NOyPTpDNSDyUYAo06uApOHh4=.7744316c-fc1f-4785-a04b-cd4d59b01915@github.com> Message-ID: <14bzIjSRLU1mOwCli60u7AnnrH8u-X2b81DIHXttdvU=.aaeb755d-2d55-43d8-82f5-8aab013b0f34@github.com> On Sat, 1 Mar 2025 13:41:31 GMT, Hannes Greule wrote: >> subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. >> >> For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: >> - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway >> - For other values, the msb is flipped, flipping twice results in the original value again. >> >> Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. >> >> One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like >> >> int v = 0; >> for (int datum : data) { >> v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); >> } >> return v; >> >> was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. > > test/hotspot/jtreg/compiler/c2/irTests/InvolutionIdentityTests.java line 118: > >> 116: >> 117: @Test >> 118: @IR(failOn = {IRNode.REVERSE_BYTES_I}) > > I'm not sure if this is fine as the ReverseBytes nodes depend on intrinsics. From my understanding, the methods are just seen as normal methods on platforms without reverseBytes support. In that case, the test would still pass, but it might be surprising that it passes. Is this fine or is there a better way here? Interesting question - expanding on that, could arbitrary methods be marked as self-inverse to be represented in the IR? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23851#discussion_r1976828917 From jkarthikeyan at openjdk.org Mon Mar 3 05:21:59 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 3 Mar 2025 05:21:59 GMT Subject: RFR: 8349637: Integer.numberOfLeadingZeros outputs incorrectly in certain cases [v3] In-Reply-To: References: Message-ID: <9OMLTcJfU1lih2CTHWn7S_jbcjhKG64gPA7yoxH_wHk=.f276b959-7a4f-447c-bcfd-56944efaa02a@github.com> On Fri, 14 Feb 2025 14:05:26 GMT, Tobias Hartmann wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve explanation of logic > > Nice analysis, Emanuel. > > Here's my test: https://github.com/openjdk/jdk/commit/7fd87d3013d23e151c98e451c6cd07bf55b9507b > > @jaskarth could you please add something similar to this PR? At least for the integer / long cases and also for the trailing zero case. I'm fine with fixing the short/byte issue separately. Thanks for the reviews, @TobiHartmann, @merykitty, @jatin-bhateja, and everyone else for the comments! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23579#issuecomment-2693302991 From jkarthikeyan at openjdk.org Mon Mar 3 05:22:00 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 3 Mar 2025 05:22:00 GMT Subject: Integrated: 8349637: Integer.numberOfLeadingZeros outputs incorrectly in certain cases In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 05:47:52 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a fix for a miscompile in the AVX2 implementation of `CountLeadingZerosV` for int types. Currently, the implementation turns ints into floats, in order to calculating the leading zeros based on the exponent part of the float. Unfortunately, floats can only accurately represent integers up to 2^24. After that, multiple integer values can map onto the same floating point value. The issue manifests when an int is converted to a floating point representation that is higher than it, crossing a bit boundary. As an example, `(float)0x01FFFFFF == (float)0x02000000`, but `lzcnt(0x01FFFFFF) == 7` and `lzcnt(0x02000000) == 6`. The values are incorrectly rounded up. > > This patch fixes the issue by masking the input in the cases where it is larger than 2^24, to set the low bits to 0. Removing these bits prevents the accidental rounding behavior. I've added these cases to`TestNumberOfContinuousZeros`, and removed the set random seed so that it can produce random inputs to test with. > > Reviews would be appreciated! This pull request has now been integrated. Changeset: 3657e92e Author: Jasmine Karthikeyan URL: https://git.openjdk.org/jdk/commit/3657e92ead1e678942fcb272e77c3867eb5aa13e Stats: 222 lines in 3 files changed: 215 ins; 0 del; 7 mod 8349637: Integer.numberOfLeadingZeros outputs incorrectly in certain cases Reviewed-by: thartmann, qamai, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/23579 From chagedorn at openjdk.org Mon Mar 3 06:47:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Mar 2025 06:47:59 GMT Subject: RFR: 8349523: Unused runtime calls to drem/frem should be removed [v8] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 16:04:31 GMT, Marc Chevalier wrote: >> Remove frem and drem macros nodes when the result is not used. These nodes have other outputs (like memory), which is not meaningful, but preventing them to be dropped so easily. This patch removes the useless frem/drem nodes, and by rewiring the inputs to the outputs. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address comments Marked as reviewed by chagedorn (Reviewer). Looks good! ------------- PR Review: https://git.openjdk.org/jdk/pull/23694#pullrequestreview-2653192717 PR Comment: https://git.openjdk.org/jdk/pull/23694#issuecomment-2693428207 From thartmann at openjdk.org Mon Mar 3 07:05:00 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 3 Mar 2025 07:05:00 GMT Subject: RFR: 8349523: Unused runtime calls to drem/frem should be removed [v8] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 16:04:31 GMT, Marc Chevalier wrote: >> Remove frem and drem macros nodes when the result is not used. These nodes have other outputs (like memory), which is not meaningful, but preventing them to be dropped so easily. This patch removes the useless frem/drem nodes, and by rewiring the inputs to the outputs. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address comments Great work, Marc. This is a good preparation for [JDK-8347901](https://bugs.openjdk.org/browse/JDK-8347901). I added a few minor comments. src/hotspot/share/opto/divnode.cpp line 1520: > 1518: PhaseIterGVN* igvn = phase->is_IterGVN(); > 1519: > 1520: bool result_is_ignored = proj_out_or_null(TypeFunc::Parms) == nullptr; I think `result_is_unused` might be better wording (same below). src/hotspot/share/opto/node.cpp line 2948: > 2946: // `maybe_pure_function` is assumed to be the input of `this`. This is a bit redundant, > 2947: // but we already have and need maybe_pure_function in all the call sites, so > 2948: // it makes obvious that the `maybe_pure_function` is the same node as in the caller, Suggestion: // it makes it obvious that the `maybe_pure_function` is the same node as in the caller, src/hotspot/share/opto/node.cpp line 2952: > 2950: // the local in the caller. > 2951: bool Node::is_data_proj_of_pure_function(const Node* maybe_pure_function) const { > 2952: return Opcode() == Op_Proj && static_cast(this)->_con == TypeFunc::Parms && maybe_pure_function->is_pure_function(); You should use `as_Proj()` here instead of the static cast. ------------- PR Review: https://git.openjdk.org/jdk/pull/23694#pullrequestreview-2653213655 PR Review Comment: https://git.openjdk.org/jdk/pull/23694#discussion_r1976968908 PR Review Comment: https://git.openjdk.org/jdk/pull/23694#discussion_r1976967349 PR Review Comment: https://git.openjdk.org/jdk/pull/23694#discussion_r1976966281 From duke at openjdk.org Mon Mar 3 07:43:41 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 3 Mar 2025 07:43:41 GMT Subject: RFR: 8349523: Unused runtime calls to drem/frem should be removed [v9] In-Reply-To: References: Message-ID: > Remove frem and drem macros nodes when the result is not used. These nodes have other outputs (like memory), which is not meaningful, but preventing them to be dropped so easily. This patch removes the useless frem/drem nodes, and by rewiring the inputs to the outputs. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23694/files - new: https://git.openjdk.org/jdk/pull/23694/files/bfd185d8..a067f613 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23694&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23694&range=07-08 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23694.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23694/head:pull/23694 PR: https://git.openjdk.org/jdk/pull/23694 From duke at openjdk.org Mon Mar 3 07:43:42 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 3 Mar 2025 07:43:42 GMT Subject: RFR: 8349523: Unused runtime calls to drem/frem should be removed [v8] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 07:01:49 GMT, Tobias Hartmann wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> address comments > > src/hotspot/share/opto/divnode.cpp line 1520: > >> 1518: PhaseIterGVN* igvn = phase->is_IterGVN(); >> 1519: >> 1520: bool result_is_ignored = proj_out_or_null(TypeFunc::Parms) == nullptr; > > I think `result_is_unused` might be better wording (same below). No strong opinion. Done. > src/hotspot/share/opto/node.cpp line 2948: > >> 2946: // `maybe_pure_function` is assumed to be the input of `this`. This is a bit redundant, >> 2947: // but we already have and need maybe_pure_function in all the call sites, so >> 2948: // it makes obvious that the `maybe_pure_function` is the same node as in the caller, > > Suggestion: > > // it makes it obvious that the `maybe_pure_function` is the same node as in the caller, Done. > src/hotspot/share/opto/node.cpp line 2952: > >> 2950: // the local in the caller. >> 2951: bool Node::is_data_proj_of_pure_function(const Node* maybe_pure_function) const { >> 2952: return Opcode() == Op_Proj && static_cast(this)->_con == TypeFunc::Parms && maybe_pure_function->is_pure_function(); > > You should use `as_Proj()` here instead of the static cast. Alright! Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23694#discussion_r1977008639 PR Review Comment: https://git.openjdk.org/jdk/pull/23694#discussion_r1977009207 PR Review Comment: https://git.openjdk.org/jdk/pull/23694#discussion_r1977009610 From thartmann at openjdk.org Mon Mar 3 08:27:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 3 Mar 2025 08:27:54 GMT Subject: RFR: 8349523: Unused runtime calls to drem/frem should be removed [v9] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 07:43:41 GMT, Marc Chevalier wrote: >> Remove frem and drem macros nodes when the result is not used. These nodes have other outputs (like memory), which is not meaningful, but preventing them to be dropped so easily. This patch removes the useless frem/drem nodes, and by rewiring the inputs to the outputs. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address comments Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23694#pullrequestreview-2653374093 From duke at openjdk.org Mon Mar 3 08:43:57 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 3 Mar 2025 08:43:57 GMT Subject: RFR: 8349523: Unused runtime calls to drem/frem should be removed [v9] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 07:43:41 GMT, Marc Chevalier wrote: >> Remove frem and drem macros nodes when the result is not used. These nodes have other outputs (like memory), which is not meaningful, but preventing them to be dropped so easily. This patch removes the useless frem/drem nodes, and by rewiring the inputs to the outputs. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address comments Thanks everyone for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23694#issuecomment-2693636757 From duke at openjdk.org Mon Mar 3 08:43:57 2025 From: duke at openjdk.org (duke) Date: Mon, 3 Mar 2025 08:43:57 GMT Subject: RFR: 8349523: Unused runtime calls to drem/frem should be removed [v9] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 07:43:41 GMT, Marc Chevalier wrote: >> Remove frem and drem macros nodes when the result is not used. These nodes have other outputs (like memory), which is not meaningful, but preventing them to be dropped so easily. This patch removes the useless frem/drem nodes, and by rewiring the inputs to the outputs. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address comments @marc-chevalier Your change (at version a067f613a50f40d4697701ab00707c6c30af0553) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23694#issuecomment-2693639173 From bkilambi at openjdk.org Mon Mar 3 09:08:52 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 3 Mar 2025 09:08:52 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector In-Reply-To: <2DXecFoDHdgQSnZFZ-gqmXRxXz0nU47Eg3clS5_q1bo=.822a4d7c-988c-4f08-ad22-a81cf9fd1484@github.com> References: <-exSdNf1CuxqYL--Mi4-L1m2Gop9bPIvdgqQEpAUIeM=.5f4936a7-31d4-45b7-bddf-e973b3687c18@github.com> <2DXecFoDHdgQSnZFZ-gqmXRxXz0nU47Eg3clS5_q1bo=.822a4d7c-988c-4f08-ad22-a81cf9fd1484@github.com> Message-ID: On Thu, 27 Feb 2025 02:04:24 GMT, Xiaohong Gong wrote: >> Thank you for your inputs. I'll look into this. > > Hi @Bhavana-Kilambi , I'v created a new PR https://github.com/openjdk/jdk/pull/23790 to implement the `VectorRearrange` for small lane count vector types like `2D`. I think the implementation is quite same with what we discussed here. Any feedback please let me know. Thanks! @XiaohongGong thank you! I will check it out. Apologies for being so slow in responding (got pulled into something else). I will update this PR with my latest patch soon. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23570#discussion_r1977121908 From dfenacci at openjdk.org Mon Mar 3 09:22:52 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 3 Mar 2025 09:22:52 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Sat, 1 Mar 2025 13:34:30 GMT, Hannes Greule wrote: > subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. > > For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: > - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway > - For other values, the msb is flipped, flipping twice results in the original value again. > > Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. > > One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like > > int v = 0; > for (int datum : data) { > v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); > } > return v; > > was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. Nice refinement @SirYwell! You are mentioning that > During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like > > int v = 0; > for (int datum : data) { > v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); > } > return v; > was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. I'm not totally sure I fully get what you mean here: does this optimization hinder vectorization in some cases? Does this result in a slowdown? (BTW do you have benchmark results?) Should we possibly try to detect this early and avoid simplifying? ------------- PR Review: https://git.openjdk.org/jdk/pull/23851#pullrequestreview-2653511319 From duke at openjdk.org Mon Mar 3 09:35:58 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 3 Mar 2025 09:35:58 GMT Subject: Integrated: 8349523: Unused runtime calls to drem/frem should be removed In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 12:47:03 GMT, Marc Chevalier wrote: > Remove frem and drem macros nodes when the result is not used. These nodes have other outputs (like memory), which is not meaningful, but preventing them to be dropped so easily. This patch removes the useless frem/drem nodes, and by rewiring the inputs to the outputs. > > Thanks, > Marc This pull request has now been integrated. Changeset: 4109c73a Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/4109c73a78c424d409e9fdd96913a772467666c8 Stats: 253 lines in 9 files changed: 236 ins; 0 del; 17 mod 8349523: Unused runtime calls to drem/frem should be removed Reviewed-by: thartmann, kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/23694 From rcastanedalo at openjdk.org Mon Mar 3 09:40:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 3 Mar 2025 09:40:54 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v5] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: <0bOvNv6RcedfReD-MQjyFr50E-3vXMHINOHyBzBUeGc=.056d22a1-f7e7-48f1-a25c-aa3c02409910@github.com> On Fri, 28 Feb 2025 09:00:40 GMT, Daniel Lund?n wrote: >> When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. >> >> ### Changeset >> >> It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. >> >> To illustrate the idealization and how it resolves this issue, consider the example below. >> >> ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) >> >> `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. >> >> We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. >> >> The changeset consists of the following changes. >> - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. >> - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. >> - Add multiple new regression tests in `TestGCMLoadPlacement.java`. >> >> For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/ac... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update after Christian's review Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23691#pullrequestreview-2653564689 From bkilambi at openjdk.org Mon Mar 3 09:46:51 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 3 Mar 2025 09:46:51 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: <17arBLP__LxvP0MS9liI_o9HTVpzxDJrjy3LjYNn8Ng=.8be17d07-09b0-4d31-b41f-66d3c9be5fad@github.com> On Wed, 26 Feb 2025 01:18:57 GMT, Xiaohong Gong wrote: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2595: > 2593: // type B/S/I/L/F/D, and the offset between two types is 16; Hence > 2594: // the offset for L is 48. > 2595: lea(rscratch1, Hi @XiaohongGong , thanks for adding support for 2D/2L as well. I was trying to implement the same for the two vector table and I am wondering what you think of this implementation - negr(dst, shuffle); // this would help create a mask. If input is 1, it would be all 1s and all 0s if its 0 dup(tmp1, src1, 0); // duplicate first element of src1 dup(tmp2, src2, 1); // duplicate second element of src2 bsl(dst, T16B, tmp2, tmp1); // Select from tmp2 if dst is 1 and from tmp1 if dst is 0 I am really not sure which implementation would be faster though. This implementation might take around 8 cycles. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1977187103 From xgong at openjdk.org Mon Mar 3 09:51:06 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Mar 2025 09:51:06 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: <17arBLP__LxvP0MS9liI_o9HTVpzxDJrjy3LjYNn8Ng=.8be17d07-09b0-4d31-b41f-66d3c9be5fad@github.com> References: <17arBLP__LxvP0MS9liI_o9HTVpzxDJrjy3LjYNn8Ng=.8be17d07-09b0-4d31-b41f-66d3c9be5fad@github.com> Message-ID: On Mon, 3 Mar 2025 09:44:37 GMT, Bhavana Kilambi wrote: >> The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. >> >> Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. >> >> This patch added the rearrange support for vector types with small lane count. Here are the main changes: >> - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) >> - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation >> - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one >> >> Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. >> >> 1) NEON >> >> JMH on panama-vector:vectorIntrinsics: >> >> Benchmark (size) Mode Cnt Units Before After Gain >> Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x >> Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x >> Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x >> Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x >> Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x >> Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x >> Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x >> Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x >> Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x >> Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x >> Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x >> Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.... > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2595: > >> 2593: // type B/S/I/L/F/D, and the offset between two types is 16; Hence >> 2594: // the offset for L is 48. >> 2595: lea(rscratch1, > > Hi @XiaohongGong , thanks for adding support for 2D/2L as well. I was trying to implement the same for the two vector table and I am wondering what you think of this implementation - > > negr(dst, shuffle); // this would help create a mask. If input is 1, it would be all 1s and all 0s if its 0 > dup(tmp1, src1, 0); // duplicate first element of src1 > dup(tmp2, src2, 1); // duplicate second element of src2 > bsl(dst, T16B, tmp2, tmp1); // Select from tmp2 if dst is 1 and from tmp1 if dst is 0 > > > > I am really not sure which implementation would be faster though. This implementation might take around 8 cycles. Sounds good to me. I will try with this solution and compare the performance on my Grace CPU. Thanks for this advice! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1977192725 From adinn at openjdk.org Mon Mar 3 10:11:38 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 3 Mar 2025 10:11:38 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs [v2] In-Reply-To: References: Message-ID: > The two special case opto runtime blobs that support uncommon trap and exception handling are currently being generated using hard wired blob names determined by port-specific code. They should employ the standard blob names generated from shared declarations in file stubDeclarations.hpp. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: correct error in arm stub name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23829/files - new: https://git.openjdk.org/jdk/pull/23829/files/95c299a8..526a8eea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23829&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23829&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23829.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23829/head:pull/23829 PR: https://git.openjdk.org/jdk/pull/23829 From adinn at openjdk.org Mon Mar 3 10:11:39 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 3 Mar 2025 10:11:39 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 20:02:13 GMT, Aleksey Shipilev wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> correct error in arm stub name > > Hm. It looks like in release builds all of these would be "runtime stub"? Is that expected? > > > const char* OptoRuntime::stub_name(address entry) { > #ifndef PRODUCT > CodeBlob* cb = CodeCache::find_blob(entry); > RuntimeStub* rs =(RuntimeStub *)cb; > assert(rs != nullptr && rs->is_runtime_stub(), "not a runtime stub"); > return rs->name(); > #else > // Fast implementation for product mode (maybe it should be inlined too) > return "runtime stub"; > #endif > } @shipilev > Hm. It looks like in release builds all of these would be "runtime stub"? Is that expected? git blame indicates that it's been like that since jdk7 was initially loaded in 2007-12-01 ;-) We could perhaps fix it to say something more useful. Maybe in a separate PR in case something depends on it having that value in release builds? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23829#issuecomment-2693858294 From adinn at openjdk.org Mon Mar 3 10:11:39 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 3 Mar 2025 10:11:39 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs In-Reply-To: References: Message-ID: On Sat, 1 Mar 2025 19:55:11 GMT, Vladimir Kozlov wrote: >> The two special case opto runtime blobs that support uncommon trap and exception handling are currently being generated using hard wired blob names determined by port-specific code. They should employ the standard blob names generated from shared declarations in file stubDeclarations.hpp. > > @adinn I am printing assembler code (`-XX:CompileCommand=print,Test::test) on Mac M1 for nmethod and see this with latest JDK: > > 0x000000010ca8c468: movz x8, #0x6d00 ; {runtime_call StubRoutines (finalstubs)} > 0x000000010ca8c46c: movk x8, #0xca0, lsl #16 > 0x000000010ca8c470: movk x8, #0x1, lsl #32 > 0x000000010ca8c474: blr x8 > > > before (jdk 23) it was: > > > 0x000000011115fe2c: movz x8, #0xf340 ; {runtime_call Stub::nmethod_entry_barrier} > 0x000000011115fe30: movk x8, #0x1106, lsl #16 > 0x000000011115fe34: movk x8, #0x1, lsl #32 > 0x000000011115fe38: blr x8 > > > > It could be something changed in code generation but **StubRoutines (finalstubs)** is not helpful. @vnkozlov > It could be something changed in code generation but StubRoutines (finalstubs) is not helpful. Agreed. Although, I don't think that relates to this change. I am investigating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23829#issuecomment-2693862546 From adinn at openjdk.org Mon Mar 3 10:11:39 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 3 Mar 2025 10:11:39 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs [v2] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 17:28:34 GMT, Vladimir Kozlov wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> correct error in arm stub name > > src/hotspot/cpu/arm/runtime_arm.cpp line 210: > >> 208: // setup code generation tools >> 209: // Measured 8/7/03 at 256 in 32bit debug build >> 210: const char* name = OptoRuntime::stub_name(OptoStubId::uncommon_trap_id); > > Typo. Should be `exception_id` Thanks for catching that. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23829#discussion_r1977227182 From mli at openjdk.org Mon Mar 3 10:35:33 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Mar 2025 10:35:33 GMT Subject: RFR: 8351033: RISC-V: TestFloat16ScalarOperations asserts with offset (4210) is too large to be patched in one beq/bge/bgeu/blt/bltu/bne instruction! Message-ID: Hi, Can you help to review the patch? This patch could also fix JDK-8340884. [JDK-8351033](https://bugs.openjdk.org/browse/JDK-8351033) is quite similar to [JDK-8340884](https://bugs.openjdk.org/browse/JDK-8340884). The difference is that in [JDK-8340884](https://bugs.openjdk.org/browse/JDK-8340884), offset is quite bigger 134990567309312. Thanks ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/23856/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23856&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351033 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23856.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23856/head:pull/23856 PR: https://git.openjdk.org/jdk/pull/23856 From dlunden at openjdk.org Mon Mar 3 10:51:58 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 3 Mar 2025 10:51:58 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v7] In-Reply-To: References: Message-ID: <8FcQbTMVTexG2nSfmR1k9U1e3rpQqsJUr32xYV8u3rE=.aa4d8709-647d-4dc5-be80-e02b4cff1223@github.com> On Mon, 14 Oct 2024 14:17:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix uncommon_freq Keep alive. Reviewing this now (apologies for the long delay). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2693969703 From tschatzl at openjdk.org Mon Mar 3 12:11:02 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 12:11:02 GMT Subject: RFR: 8350956: Fix repetitions of the word "the" in compiler component comments Message-ID: Hi all, please review this trivial change that fixes "the the" repetitions in the compiler related sources. If you think it's not worth fixing, I am okay with that and just retract the change. Testing: gha Thanks, Thomas ------------- Commit messages: - 8350956 Changes: https://git.openjdk.org/jdk/pull/23858/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23858&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350956 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23858/head:pull/23858 PR: https://git.openjdk.org/jdk/pull/23858 From yzheng at openjdk.org Mon Mar 3 12:10:59 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 3 Mar 2025 12:10:59 GMT Subject: RFR: 8315488: Remove outdated and unused ciReplay support from SA [v7] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 18:10:26 GMT, Coleen Phillimore wrote: >> This change removes the ci, c1 and c2 compiler code from the serviceability agent. The ciReplay functionality is supported inside the jvm and this duplicated functionality in SA had bit rotted so is removed. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix COMPILER2 preprocessor constant for SA. JVMCI changes LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/23782#pullrequestreview-2653911941 From coleenp at openjdk.org Mon Mar 3 12:11:00 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Mar 2025 12:11:00 GMT Subject: RFR: 8315488: Remove outdated and unused ciReplay support from SA [v7] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 18:10:26 GMT, Coleen Phillimore wrote: >> This change removes the ci, c1 and c2 compiler code from the serviceability agent. The ciReplay functionality is supported inside the jvm and this duplicated functionality in SA had bit rotted so is removed. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix COMPILER2 preprocessor constant for SA. Thank you for reviewing Chris, Vladimir and Yudi. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23782#issuecomment-2694143147 From coleenp at openjdk.org Mon Mar 3 12:11:03 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Mar 2025 12:11:03 GMT Subject: Integrated: 8315488: Remove outdated and unused ciReplay support from SA In-Reply-To: References: Message-ID: <-j80eG15F0UcrdIeoOpXnPIxNWXoYp-7mngi5jeI7SE=.c36e2006-972f-4206-86ca-bcd0b3d92466@github.com> On Tue, 25 Feb 2025 16:52:09 GMT, Coleen Phillimore wrote: > This change removes the ci, c1 and c2 compiler code from the serviceability agent. The ciReplay functionality is supported inside the jvm and this duplicated functionality in SA had bit rotted so is removed. > Tested with tier1-4. This pull request has now been integrated. Changeset: 8b0468fa Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/8b0468faf1c38f2d1d887ab92b76dfff625482ef Stats: 6022 lines in 109 files changed: 2 ins; 5837 del; 183 mod 8315488: Remove outdated and unused ciReplay support from SA Reviewed-by: kvn, cjplummer, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/23782 From epeter at openjdk.org Mon Mar 3 12:12:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Mar 2025 12:12:56 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v5] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: On Fri, 28 Feb 2025 09:00:40 GMT, Daniel Lund?n wrote: >> When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. >> >> ### Changeset >> >> It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. >> >> To illustrate the idealization and how it resolves this issue, consider the example below. >> >> ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) >> >> `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. >> >> We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. >> >> The changeset consists of the following changes. >> - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. >> - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. >> - Add multiple new regression tests in `TestGCMLoadPlacement.java`. >> >> For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/ac... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update after Christian's review @dlunde This looks reasonable :) I have a question about compile time below. Also: do you still intend to add some kind of verification code to catch these kinds of missing anti-dependnece edges? That should probably be a separate RFE if we do it at all. src/hotspot/share/opto/cfgnode.cpp line 2061: > 2059: // non-termination. For more details, see comments at the call site in > 2060: // PhiNode::Ideal. This is really a const method, but Node_List currently only > 2061: // permits non-const elements. Which `Node_List` is this about? Would `GrowabelArray` be a good alternative? src/hotspot/share/opto/cfgnode.cpp line 2062: > 2060: // PhiNode::Ideal. This is really a const method, but Node_List currently only > 2061: // permits non-const elements. > 2062: bool PhiNode::is_split_through_mergemem_terminating() { This method could walk a significant part of the graph, right? And since this happens during IGVN, this could happen repeatedly, correct? Often we have limits on traversals, but maybe we don't want that here. I'm just wondering if this could have an impact on compile time. But then again: I don't know if there is even an alternative ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23691#pullrequestreview-2653878700 PR Review Comment: https://git.openjdk.org/jdk/pull/23691#discussion_r1977365361 PR Review Comment: https://git.openjdk.org/jdk/pull/23691#discussion_r1977402218 From epeter at openjdk.org Mon Mar 3 12:12:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Mar 2025 12:12:57 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v5] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: On Mon, 3 Mar 2025 11:42:24 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after Christian's review > > src/hotspot/share/opto/cfgnode.cpp line 2061: > >> 2059: // non-termination. For more details, see comments at the call site in >> 2060: // PhiNode::Ideal. This is really a const method, but Node_List currently only >> 2061: // permits non-const elements. > > Which `Node_List` is this about? Would `GrowabelArray` be a good alternative? Probably not important enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23691#discussion_r1977393124 From rcastanedalo at openjdk.org Mon Mar 3 12:14:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 3 Mar 2025 12:14:52 GMT Subject: RFR: 8350956: Fix repetitions of the word "the" in compiler component comments In-Reply-To: References: Message-ID: <_qeU9w886YHSBLxN-IunDO6ted4cBnT54IIEtvxqXi8=.38195df3-0e5e-43fb-8bb4-bf3435cba607@github.com> On Mon, 3 Mar 2025 11:07:41 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial change that fixes "the the" repetitions in the > compiler related sources. > > If you think it's not worth fixing, I am okay with that and just retract the change. > > Testing: gha > > Thanks, > Thomas Looks good and trivial! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23858#pullrequestreview-2653954556 From rcastanedalo at openjdk.org Mon Mar 3 12:20:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 3 Mar 2025 12:20:53 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v5] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: On Mon, 3 Mar 2025 12:10:12 GMT, Emanuel Peter wrote: > Also: do you still intend to add some kind of verification code to catch these kinds of missing anti-dependnece edges? That should probably be a separate RFE if we do it at all. Hi Emanuel, I'm working on that in [JDK-8349930](https://bugs.openjdk.org/browse/JDK-8349930). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23691#issuecomment-2694185227 From fyang at openjdk.org Mon Mar 3 12:30:52 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 3 Mar 2025 12:30:52 GMT Subject: RFR: 8351033: RISC-V: TestFloat16ScalarOperations asserts with offset (4210) is too large to be patched in one beq/bge/bgeu/blt/bltu/bne instruction! In-Reply-To: References: Message-ID: <_1Q_UpmhBlYxP4To2bYdXImf7O2oVnCCbSK5cZYC9bQ=.db184029-ebc5-4dda-a5db-834942f2f078@github.com> On Mon, 3 Mar 2025 10:30:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > This patch could also fix JDK-8340884. > [JDK-8351033](https://bugs.openjdk.org/browse/JDK-8351033) is quite similar to [JDK-8340884](https://bugs.openjdk.org/browse/JDK-8340884). > The difference is that in [JDK-8340884](https://bugs.openjdk.org/browse/JDK-8340884), offset is quite bigger 134990567309312. > > Thanks Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23856#pullrequestreview-2653992383 From tschatzl at openjdk.org Mon Mar 3 12:33:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 12:33:56 GMT Subject: RFR: 8350956: Fix repetitions of the word "the" in compiler component comments In-Reply-To: <_qeU9w886YHSBLxN-IunDO6ted4cBnT54IIEtvxqXi8=.38195df3-0e5e-43fb-8bb4-bf3435cba607@github.com> References: <_qeU9w886YHSBLxN-IunDO6ted4cBnT54IIEtvxqXi8=.38195df3-0e5e-43fb-8bb4-bf3435cba607@github.com> Message-ID: On Mon, 3 Mar 2025 12:12:30 GMT, Roberto Casta?eda Lozano wrote: >> Hi all, >> >> please review this trivial change that fixes "the the" repetitions in the >> compiler related sources. >> >> If you think it's not worth fixing, I am okay with that and just retract the change. >> >> Testing: gha >> >> Thanks, >> Thomas > > Looks good and trivial! Thanks @robcasloz for your review ------------- PR Comment: https://git.openjdk.org/jdk/pull/23858#issuecomment-2694215730 From tschatzl at openjdk.org Mon Mar 3 12:33:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 12:33:57 GMT Subject: Integrated: 8350956: Fix repetitions of the word "the" in compiler component comments In-Reply-To: References: Message-ID: <9Xs26rvFpf6FkbQkFXXpwMJZpT4SdCUJBWzXmF1lPyE=.12f7076d-af81-4fd8-96f0-491c62b0d11e@github.com> On Mon, 3 Mar 2025 11:07:41 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial change that fixes "the the" repetitions in the > compiler related sources. > > If you think it's not worth fixing, I am okay with that and just retract the change. > > Testing: gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: 30b0c609 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/30b0c6098028cce63e65bd9d563973f2774fa74d Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod 8350956: Fix repetitions of the word "the" in compiler component comments Reviewed-by: rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/23858 From duke at openjdk.org Mon Mar 3 12:57:05 2025 From: duke at openjdk.org (Matthias Ernst) Date: Mon, 3 Mar 2025 12:57:05 GMT Subject: RFR: 8346664: C2: Optimize mask check with constant offset [v24] In-Reply-To: References: Message-ID: <2ZByVojhoirjQexK-F1HfATSk1ZO9HIqD3zLl_issP0=.fc23d8e8-f414-4717-95aa-0c4c0b4c4fdb@github.com> On Fri, 14 Feb 2025 07:18:52 GMT, Matthias Ernst wrote: >> Fixes [JDK-8346664](https://bugs.openjdk.org/browse/JDK-8346664): extends the optimization of masked sums introduced in #6697 to cover constant values, which currently break the optimization. >> >> Such constant values arise in an expression of the following form, for example from `MemorySegmentImpl#isAlignedForElement`: >> >> >> (base + (index + 1) << 8) & 255 >> => MulNode >> (base + (index << 8 + 256)) & 255 >> => AddNode >> ((base + index << 8) + 256) & 255 >> >> >> Currently, `256` is not being recognized as a shifted value. This PR enables further reduction: >> >> >> ((base + index << 8) + 256) & 255 >> => MulNode (this PR) >> (base + index << 8) & 255 >> => MulNode (PR #6697) >> base & 255 (loop invariant) >> >> >> Implementation notes: >> * I verified that the originating issue "scaled varhandle indexed with i+1" (https://mail.openjdk.org/pipermail/panama-dev/2024-December/020835.html) is resolved with this PR. >> * ~in order to stay with the flow of the current implementation, I refrained from solving general (const & mask)==0 cases, but only those where const == _ << shift.~ >> * ~I modified existing test cases adding/subtracting from the index var (which would fail with current C2). Let me know if would like to see separate cases for these.~ > > Matthias Ernst has updated the pull request incrementally with one additional commit since the last revision: > > incorporate @eme64's comment suggestions Concretely, my understanding of what's breaking here is the following: `AndI ( CastII ( ConI ) )` : `AndI` reaches through the cast and, since 8346664, also recognizes constants in addition to shifts and can eliminate. This currently gets stuck on the Cast node in CCP. Pushing the And node fixes the crash and looks straightforward, but it makes me wonder why it is necessary, and why the change doesn't bubble through the Cast already: shouldn't the cast node be re-pushed when its input changes,`CastII(Con).Value()` now return the constant, and then bubble further to the And? It appears to me that either there's something off in Cast's Value handling (there's more to it than expected ;-), OR - if it is somehow necessary to preserve this cast - it is not clear to me why And (and others) reaching _through_ the Cast is legal in the first place. Does this make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22856#issuecomment-2694307169 From dlunden at openjdk.org Mon Mar 3 13:10:56 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 3 Mar 2025 13:10:56 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v6] In-Reply-To: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: > When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. > > ### Changeset > > It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. > > To illustrate the idealization and how it resolves this issue, consider the example below. > > ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) > > `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. > > We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. > > The changeset consists of the following changes. > - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. > - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. > - Add multiple new regression tests in `TestGCMLoadPlacement.java`. > > For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/13394882532) > - `tier1` to `tier4` (an... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Change to GrowableArray ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23691/files - new: https://git.openjdk.org/jdk/pull/23691/files/892bf5f6..a4c1a15e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23691&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23691&range=04-05 Stats: 7 lines in 2 files changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23691.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23691/head:pull/23691 PR: https://git.openjdk.org/jdk/pull/23691 From dlunden at openjdk.org Mon Mar 3 13:10:57 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 3 Mar 2025 13:10:57 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v5] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: <9QfpsIxlw7SvP0jYJtViqGuzJ1Y9vgjbiVafIAhXKUs=.04b6847e-d11f-45ce-8b6d-3c0a0173a6e8@github.com> On Mon, 3 Mar 2025 12:01:03 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/cfgnode.cpp line 2061: >> >>> 2059: // non-termination. For more details, see comments at the call site in >>> 2060: // PhiNode::Ideal. This is really a const method, but Node_List currently only >>> 2061: // permits non-const elements. >> >> Which `Node_List` is this about? Would `GrowabelArray` be a good alternative? > > Probably not important enough. Thanks, I agree `GrowableArray` is a better option (now changed). It annoyed me that I couldn't make the function const due to `Node_List`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23691#discussion_r1977484031 From dlunden at openjdk.org Mon Mar 3 13:20:59 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 3 Mar 2025 13:20:59 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v5] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: On Mon, 3 Mar 2025 12:02:46 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after Christian's review > > src/hotspot/share/opto/cfgnode.cpp line 2062: > >> 2060: // PhiNode::Ideal. This is really a const method, but Node_List currently only >> 2061: // permits non-const elements. >> 2062: bool PhiNode::is_split_through_mergemem_terminating() { > > This method could walk a significant part of the graph, right? And since this happens during IGVN, this could happen repeatedly, correct? > > Often we have limits on traversals, but maybe we don't want that here. > > I'm just wondering if this could have an impact on compile time. But then again: I don't know if there is even an alternative ? Yes, in theory it could walk a significant part of the graph. However, it just walks bot `MergeMem`s and `Phi`s, and stops at actual defining stores. With an assumption that the defining stores of `Phi`s are generally close to the `Phi`, it should run fairly quick in practice. Given that we now know that this idealization is needed for soundness, adding a limit sounds dangerous. But, I guess we could add a limit after which we bailout (and perhaps assert). I ran C2 compilation speed testing on DaCapo and didn't see any statistically significant regression (although there's quite a bit of noise). Just looking at the mean compilation time across all benchmarks, there is a slight regression. On Linux x64 (which is the least noisy), there was a 0.84% mean regression in C2 compilation time. I could also run some more experiments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23691#discussion_r1977505529 From epeter at openjdk.org Mon Mar 3 13:29:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Mar 2025 13:29:54 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v6] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: On Mon, 3 Mar 2025 13:10:56 GMT, Daniel Lund?n wrote: >> When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. >> >> ### Changeset >> >> It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. >> >> To illustrate the idealization and how it resolves this issue, consider the example below. >> >> ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) >> >> `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. >> >> We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. >> >> The changeset consists of the following changes. >> - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. >> - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. >> - Add multiple new regression tests in `TestGCMLoadPlacement.java`. >> >> For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/ac... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Change to GrowableArray Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23691#pullrequestreview-2654129647 From epeter at openjdk.org Mon Mar 3 13:29:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Mar 2025 13:29:55 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v5] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: <8d7jfMP9DHNkTdbDO1mM1Dcmy7t4ZbxX4Ih36lo_6NI=.ba85bb90-9b40-4e02-9f50-ef0b163b2a11@github.com> On Mon, 3 Mar 2025 13:18:42 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/cfgnode.cpp line 2062: >> >>> 2060: // PhiNode::Ideal. This is really a const method, but Node_List currently only >>> 2061: // permits non-const elements. >>> 2062: bool PhiNode::is_split_through_mergemem_terminating() { >> >> This method could walk a significant part of the graph, right? And since this happens during IGVN, this could happen repeatedly, correct? >> >> Often we have limits on traversals, but maybe we don't want that here. >> >> I'm just wondering if this could have an impact on compile time. But then again: I don't know if there is even an alternative ? > > Yes, in theory it could walk a significant part of the graph. However, it just walks bot `MergeMem`s and `Phi`s, and stops at actual defining stores. With an assumption that the defining stores of `Phi`s are generally close to the `Phi`, it should run fairly quick in practice. Given that we now know that this idealization is needed for soundness, adding a limit sounds dangerous. But, I guess we could add a limit after which we bailout (and perhaps assert). > > I ran C2 compilation speed testing on DaCapo and didn't see any statistically significant regression (although there's quite a bit of noise). Just looking at the mean compilation time across all benchmarks, there is a slight regression. On Linux x64 (which is the least noisy), there was a 0.84% mean regression in C2 compilation time. I could also run some more experiments. I think that sounds reasonable. Not sure if more experiments are warranted now, I'm happy with the your due diligence as you report above ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23691#discussion_r1977517241 From dlunden at openjdk.org Mon Mar 3 13:49:22 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 3 Mar 2025 13:49:22 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v6] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: <_dXv19_PEk_dCGPI-D5y0Mq6vLpljcQW1OjN4rNw7no=.d57ef499-3be6-4a43-9314-92dbf911e774@github.com> On Mon, 3 Mar 2025 13:10:56 GMT, Daniel Lund?n wrote: >> When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. >> >> ### Changeset >> >> It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. >> >> To illustrate the idealization and how it resolves this issue, consider the example below. >> >> ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) >> >> `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. >> >> We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. >> >> The changeset consists of the following changes. >> - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. >> - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. >> - Add multiple new regression tests in `TestGCMLoadPlacement.java`. >> >> For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/ac... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Change to GrowableArray Thanks for the reviews! I will run some more sanity testing and benchmarking on the updated changeset before integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23691#issuecomment-2694448534 From liach at openjdk.org Mon Mar 3 14:06:00 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 3 Mar 2025 14:06:00 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Sat, 1 Mar 2025 13:34:30 GMT, Hannes Greule wrote: > subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. > > For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: > - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway > - For other values, the msb is flipped, flipping twice results in the original value again. > > Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. > > One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like > > int v = 0; > for (int datum : data) { > v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); > } > return v; > > was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. Per my understanding this should be an issue with autovectorization, and it should be fixed by fixing autovectorization instead of by blocking valid and sound simplifications. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23851#issuecomment-2694495187 From hgreule at openjdk.org Mon Mar 3 14:06:02 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 3 Mar 2025 14:06:02 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 09:20:11 GMT, Damon Fenacci wrote: > I'm not totally sure I fully get what you mean here: does this optimization hinder vectorization in some cases? Does this result in a slowdown? (BTW do you have benchmark results?) Should we possibly try to detect this early and avoid simplifying? What happens basically comes down to this check: https://github.com/openjdk/jdk/blob/885338b5f38ed05d8b91efc0178b371f2f89310e/src/hotspot/share/opto/superword.cpp#L1759 Without my change, `_num_work_vecs` is 3 (I assume, I didn't debug that part) as we have one load and two reverse bytes operations. `_num_reductions` is 1, the xor. With my change, when we come to this check, `_num_work_vecs` is 1 (That part I checked with the debugger), as we only have the load left. So superword does not consider vectorization to be profitable. My benchmark code: https://gist.github.com/SirYwell/a76578dc5f3c10cd08b768a3bd39a988 Results on my machine (Ryzen 9 3900X): mainline Benchmark Mode Cnt Score Error Units DoubledReverseBytes.doubleReverse thrpt 3 3287,042 ? 398,656 ops/ms DoubledReverseBytes.folded thrpt 3 418,627 ? 20,797 ops/ms this pr Benchmark Mode Cnt Score Error Units DoubledReverseBytes.doubleReverse thrpt 3 419,369 ? 24,974 ops/ms DoubledReverseBytes.folded thrpt 3 415,469 ? 88,714 ops/ms You can see the almost 8x speedup due to vectorization that happens on mainline but not anymore with my change. I don't think this should block this change. Detecting such situations also seems like a rather complicated workaround. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23851#issuecomment-2694504151 From dlunden at openjdk.org Mon Mar 3 14:06:32 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 3 Mar 2025 14:06:32 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v7] In-Reply-To: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: > When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. > > ### Changeset > > It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. > > To illustrate the idealization and how it resolves this issue, consider the example below. > > ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) > > `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. > > We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. > > The changeset consists of the following changes. > - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. > - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. > - Add multiple new regression tests in `TestGCMLoadPlacement.java`. > > For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/13394882532) > - `tier1` to `tier4` (an... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update missing copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23691/files - new: https://git.openjdk.org/jdk/pull/23691/files/a4c1a15e..b2ab85b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23691&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23691&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23691.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23691/head:pull/23691 PR: https://git.openjdk.org/jdk/pull/23691 From hgreule at openjdk.org Mon Mar 3 14:09:52 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 3 Mar 2025 14:09:52 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations In-Reply-To: <14bzIjSRLU1mOwCli60u7AnnrH8u-X2b81DIHXttdvU=.aaeb755d-2d55-43d8-82f5-8aab013b0f34@github.com> References: <8RU1UJbYq5dsJWwpIF_NOyPTpDNSDyUYAo06uApOHh4=.7744316c-fc1f-4785-a04b-cd4d59b01915@github.com> <14bzIjSRLU1mOwCli60u7AnnrH8u-X2b81DIHXttdvU=.aaeb755d-2d55-43d8-82f5-8aab013b0f34@github.com> Message-ID: On Mon, 3 Mar 2025 03:26:57 GMT, Chen Liang wrote: >> test/hotspot/jtreg/compiler/c2/irTests/InvolutionIdentityTests.java line 118: >> >>> 116: >>> 117: @Test >>> 118: @IR(failOn = {IRNode.REVERSE_BYTES_I}) >> >> I'm not sure if this is fine as the ReverseBytes nodes depend on intrinsics. From my understanding, the methods are just seen as normal methods on platforms without reverseBytes support. In that case, the test would still pass, but it might be surprising that it passes. Is this fine or is there a better way here? > > Interesting question - expanding on that, could arbitrary methods be marked as self-inverse to be represented in the IR? I don't think there are many such methods, and my knowledge of Ideal isn't good enough to judge whether that's possible. But it might make sense to use the nodes even when the intrinsic isn't available, and "lower" it to the existing implementation (either a call or inlining) after optimizing the node itself. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23851#discussion_r1977576936 From kvn at openjdk.org Mon Mar 3 15:55:04 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Mar 2025 15:55:04 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs [v2] In-Reply-To: References: Message-ID: <9iNcZ3ykM-NQoszuhrICz9DKm854uZHXJKMgJ3kdyKM=.42118681-12a7-4a73-a67d-01f5d065d9df@github.com> On Mon, 3 Mar 2025 10:11:38 GMT, Andrew Dinn wrote: >> The two special case opto runtime blobs that support uncommon trap and exception handling are currently being generated using hard wired blob names determined by port-specific code. They should employ the standard blob names generated from shared declarations in file stubDeclarations.hpp. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > correct error in arm stub name Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23829#pullrequestreview-2654539068 From kvn at openjdk.org Mon Mar 3 15:55:05 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Mar 2025 15:55:05 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs In-Reply-To: References: Message-ID: <4YAo_iY0X-5pGaf2xTi2Wk3rNPz0R7XRdAhZLmb2zR4=.4f56db7e-f9a1-4974-9f98-6c877f24b2cf@github.com> On Mon, 3 Mar 2025 10:08:25 GMT, Andrew Dinn wrote: > Agreed. Although, I don't think that relates to this change. I am investigating. Thank you. To be clear, it is not for these changes. I just thought you may know something since you are touching this code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23829#issuecomment-2694832327 From qamai at openjdk.org Mon Mar 3 15:58:00 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 3 Mar 2025 15:58:00 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v45] In-Reply-To: References: Message-ID: <7CxCdNhLfQtmt4CkgpQ-nJoBcjB2TKpNA2omkpFX958=.56b71763-f3dd-4130-95a4-0f664eaf7c2c@github.com> > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 59 commits: - reviews - Merge branch 'master' into unsignedbounds - refine comments - Merge branch 'master' into unsignedbounds - Merge branch 'master' into unsignedbounds - harden SimpleCanonicalResult - number lemmas - include - clean up intn_t - refine first_violation - ... and 49 more: https://git.openjdk.org/jdk/compare/4a51c61b...727216ff ------------- Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=44 Stats: 2353 lines in 13 files changed: 1789 ins; 328 del; 236 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Mon Mar 3 15:58:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 3 Mar 2025 15:58:01 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v44] In-Reply-To: <_sz68hEt9TeSdTUMkhFPSagjM5MuVVa1RK3pvrkjJmA=.abb9a9d0-e4fc-43b4-9d7f-e683b06441e5@github.com> References: <_sz68hEt9TeSdTUMkhFPSagjM5MuVVa1RK3pvrkjJmA=.abb9a9d0-e4fc-43b4-9d7f-e683b06441e5@github.com> Message-ID: On Wed, 26 Feb 2025 11:09:31 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> refine comments > > src/hotspot/share/opto/rangeinference.cpp line 187: > >> 185: bits at x > j have lower significance, and are thus irrelevant >> 186: >> 187: Which leads to r < lo, which contradicts that r >= lo > > Suggestion: > > Which leads to r < lo, which contradicts that r >= lo (according to definition of r) > > It could be nice to define it more explicitly, and give the definition of `r` a title / name. We can say according to 2.1 here, done that! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1977761618 From qamai at openjdk.org Mon Mar 3 16:55:13 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 3 Mar 2025 16:55:13 GMT Subject: RFR: 8346664: C2: Optimize mask check with constant offset [v24] In-Reply-To: References: Message-ID: <79lafTUHfpKJTTQEHoN7w5SIkeG7GeHadtgOc_hu4vk=.6b576897-c540-44ab-bc79-0bb31725f46e@github.com> On Fri, 14 Feb 2025 07:18:52 GMT, Matthias Ernst wrote: >> Fixes [JDK-8346664](https://bugs.openjdk.org/browse/JDK-8346664): extends the optimization of masked sums introduced in #6697 to cover constant values, which currently break the optimization. >> >> Such constant values arise in an expression of the following form, for example from `MemorySegmentImpl#isAlignedForElement`: >> >> >> (base + (index + 1) << 8) & 255 >> => MulNode >> (base + (index << 8 + 256)) & 255 >> => AddNode >> ((base + index << 8) + 256) & 255 >> >> >> Currently, `256` is not being recognized as a shifted value. This PR enables further reduction: >> >> >> ((base + index << 8) + 256) & 255 >> => MulNode (this PR) >> (base + index << 8) & 255 >> => MulNode (PR #6697) >> base & 255 (loop invariant) >> >> >> Implementation notes: >> * I verified that the originating issue "scaled varhandle indexed with i+1" (https://mail.openjdk.org/pipermail/panama-dev/2024-December/020835.html) is resolved with this PR. >> * ~in order to stay with the flow of the current implementation, I refrained from solving general (const & mask)==0 cases, but only those where const == _ << shift.~ >> * ~I modified existing test cases adding/subtracting from the index var (which would fail with current C2). Let me know if would like to see separate cases for these.~ > > Matthias Ernst has updated the pull request incrementally with one additional commit since the last revision: > > incorporate @eme64's comment suggestions The issue is interesting. The code shape looks like this `AndI(CastII(Phi), ConI)`. `Phi` is a constant 0 while `CastII` is `top`. `CastII` is `top` because its control input is dead. And, as a result, `CastII` does not change after CCP and we do not push its inputs (in this case the `AndI`) to the worklist, resulting in the `AndI` remains to be a `top`. On the other hand, `AndINode::Value` looks through `CastIINode`s (in `AndIL_min_trailing_zeros` we do `expr = expr->uncast()`). Which means that it sees the `Phi` being a constant 0, and returns `TypeInt::ZERO`. Of course, in this particular case, a solution is to check for top inputs before proceeding with any action. However, I think that it is not sufficient. Given that we often look through `CastNode`s when doing inference, I think it is suitable we push nodes to the worklist through `CastNode`s, too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22856#issuecomment-2694997713 From shade at openjdk.org Mon Mar 3 18:27:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 3 Mar 2025 18:27:58 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs [v2] In-Reply-To: References: Message-ID: <3slACYq-NWB6drMXvVSvd10poQExmhZ-yGjSzoM01T8=.e0e01c24-6a30-480a-9331-8d04b3de8913@github.com> On Mon, 3 Mar 2025 10:11:38 GMT, Andrew Dinn wrote: >> The two special case opto runtime blobs that support uncommon trap and exception handling are currently being generated using hard wired blob names determined by port-specific code. They should employ the standard blob names generated from shared declarations in file stubDeclarations.hpp. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > correct error in arm stub name > > Hm. It looks like in release builds all of these would be "runtime stub"? Is that expected? > We could perhaps fix it to say something more useful. Maybe in a separate PR in case something depends on it having that value in release builds? I am confused that we have `OptoRuntime::stub_id(...)`, and then we also have newly added by you: runtime.hpp: // Returns the name associated with a given stub id static const char* stub_name(OptoStubId id) { assert(id > OptoStubId::NO_STUBID && id < OptoStubId::NUM_STUBIDS, "stub id out of range"); return _stub_names[(int)id]; } Maybe `OptoRuntime::stub_name` should be calling that one directly, instead of going all the way through `CodeCache::find_blob`? Then we don't need any debug-defines there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23829#issuecomment-2695217875 From mli at openjdk.org Mon Mar 3 18:31:09 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Mar 2025 18:31:09 GMT Subject: Integrated: 8351033: RISC-V: TestFloat16ScalarOperations asserts with offset (4210) is too large to be patched in one beq/bge/bgeu/blt/bltu/bne instruction! In-Reply-To: References: Message-ID: <3jn2c0kg1qgYdqgngAXoEOjoR3B6wlSNiezQIuPXztU=.11763c1a-fc1f-4264-adf6-3a59d3420947@github.com> On Mon, 3 Mar 2025 10:30:54 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > This patch could also fix JDK-8340884. > [JDK-8351033](https://bugs.openjdk.org/browse/JDK-8351033) is quite similar to [JDK-8340884](https://bugs.openjdk.org/browse/JDK-8340884). > The difference is that in [JDK-8340884](https://bugs.openjdk.org/browse/JDK-8340884), offset is quite bigger 134990567309312. > > Thanks This pull request has now been integrated. Changeset: 79880e56 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/79880e56375a1c17ec6ad29bb0ab01868bc956ff Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8351033: RISC-V: TestFloat16ScalarOperations asserts with offset (4210) is too large to be patched in one beq/bge/bgeu/blt/bltu/bne instruction! Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/23856 From mli at openjdk.org Mon Mar 3 18:31:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Mar 2025 18:31:08 GMT Subject: RFR: 8351033: RISC-V: TestFloat16ScalarOperations asserts with offset (4210) is too large to be patched in one beq/bge/bgeu/blt/bltu/bne instruction! In-Reply-To: <_1Q_UpmhBlYxP4To2bYdXImf7O2oVnCCbSK5cZYC9bQ=.db184029-ebc5-4dda-a5db-834942f2f078@github.com> References: <_1Q_UpmhBlYxP4To2bYdXImf7O2oVnCCbSK5cZYC9bQ=.db184029-ebc5-4dda-a5db-834942f2f078@github.com> Message-ID: On Mon, 3 Mar 2025 12:28:22 GMT, Fei Yang wrote: > Thanks! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23856#issuecomment-2695222242 From mli at openjdk.org Mon Mar 3 18:32:09 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Mar 2025 18:32:09 GMT Subject: RFR: 8350940: RISC-V: remove unnecessary assert_different_registers in minmax_fp [v4] In-Reply-To: <4PGqc4-JerCL5wzS_4DJDCpiP0Tc6xN5s7HH2LuV9Ao=.5a2ca4a7-d660-4bdb-ac85-4e2d0f2a15de@github.com> References: <8fxIj9ChMAOETSVV62zYzhRZRfCmEDRtvMc88hncB5E=.4c39c81b-2004-4d72-8aa2-ff63be3996a4@github.com> <4PGqc4-JerCL5wzS_4DJDCpiP0Tc6xN5s7HH2LuV9Ao=.5a2ca4a7-d660-4bdb-ac85-4e2d0f2a15de@github.com> Message-ID: <043U30MGXmFymyFaHYpMSG3RGJR1pgii3MymhqUrVdI=.c632b702-df80-4a53-9f36-c6e917123775@github.com> On Fri, 28 Feb 2025 12:56:10 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple change? >> Seems to me it's not necessary to assert_different_registers between dst/src1/src2 in minmax_fp. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > keep cr/t1 in effect Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23842#issuecomment-2695223698 From mli at openjdk.org Mon Mar 3 18:32:09 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Mar 2025 18:32:09 GMT Subject: Integrated: 8350940: RISC-V: remove unnecessary assert_different_registers in minmax_fp In-Reply-To: <8fxIj9ChMAOETSVV62zYzhRZRfCmEDRtvMc88hncB5E=.4c39c81b-2004-4d72-8aa2-ff63be3996a4@github.com> References: <8fxIj9ChMAOETSVV62zYzhRZRfCmEDRtvMc88hncB5E=.4c39c81b-2004-4d72-8aa2-ff63be3996a4@github.com> Message-ID: On Fri, 28 Feb 2025 11:41:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple change? > Seems to me it's not necessary to assert_different_registers between dst/src1/src2 in minmax_fp. > > Thanks This pull request has now been integrated. Changeset: e1fc14fa Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/e1fc14fa17e78fef712b5635ee53d10d6d2bb50e Stats: 7 lines in 2 files changed: 0 ins; 3 del; 4 mod 8350940: RISC-V: remove unnecessary assert_different_registers in minmax_fp Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/23842 From mli at openjdk.org Mon Mar 3 18:35:02 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Mar 2025 18:35:02 GMT Subject: RFR: 8350931: RISC-V: remove unnecessary src register for fp_sqrt_d/f In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 10:19:13 GMT, Hamlin Li wrote: > Hi, > Can you review this simple patch? > It remove the unnecessary src register for fp_sqrt_d/f pipe_class, as sqrt has only one src register. > Thanks Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23839#issuecomment-2695224924 From mli at openjdk.org Mon Mar 3 18:35:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Mar 2025 18:35:03 GMT Subject: Integrated: 8350931: RISC-V: remove unnecessary src register for fp_sqrt_d/f In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 10:19:13 GMT, Hamlin Li wrote: > Hi, > Can you review this simple patch? > It remove the unnecessary src register for fp_sqrt_d/f pipe_class, as sqrt has only one src register. > Thanks This pull request has now been integrated. Changeset: f53de920 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/f53de9208cf5f841ddf80ef9c6073fa61f68fa59 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod 8350931: RISC-V: remove unnecessary src register for fp_sqrt_d/f Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/23839 From mli at openjdk.org Mon Mar 3 18:35:04 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Mar 2025 18:35:04 GMT Subject: Integrated: 8350095: RISC-V: Refactor string_compare In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 14:32:07 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Currently, `string_compare` code is a bit complicated, main reasons include: > 1. it has 2 piece of code respectively for LU and UL case, this is not necessary, basically LU and UL behaviour quite similar. > 2. it mixed LL/UU and LU/UL case together, better to separate them, as they are quite different from each other. > > This is not good for code reading and maintaining. > > > So, this patch does following refactoring: > 1. merge LU and UL code into one, i.e. remove UL code. > 2. seperate the code into 2 methods: LL/UU and LU/UL. > 3. some other misc improvement. > > I could do the following refactoring in another following pr, as in this pr I'm just moving code and removing code, it's easier to do it and review it. In particular the first one, as it needs to rewrite the existing code for UL/LU case. > 1. move alignment code of `generate_compare_long_string_different_encoding` upwards to `string_compare_long_LU`. > 2. make `SHORT_STRING` case simpler. > > > > Thanks This pull request has now been integrated. Changeset: e470f474 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/e470f474ee2176eecc211ec8e99cccc941104c68 Stats: 363 lines in 4 files changed: 193 ins; 156 del; 14 mod 8350095: RISC-V: Refactor string_compare Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/23633 From cushon at openjdk.org Mon Mar 3 18:51:28 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Mon, 3 Mar 2025 18:51:28 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint Message-ID: Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. ------------- Commit messages: - Merge branch 'openjdk:master' into mernst/JDK-8350563 - push `con->(cast*)->and` uses Changes: https://git.openjdk.org/jdk/pull/23871/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350563 Stats: 29 lines in 1 file changed: 14 ins; 3 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/23871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23871/head:pull/23871 PR: https://git.openjdk.org/jdk/pull/23871 From bulasevich at openjdk.org Mon Mar 3 20:11:59 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 3 Mar 2025 20:11:59 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v13] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: <7d8H3XUeBH5Rw2gGfsrvOaqeMiIknIx5jtI62_eEE6c=.1a37ac19-ecd5-4e53-973b-713c1e00a4fa@github.com> On Thu, 27 Feb 2025 14:31:31 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache. >> >> OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data). >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - cleanup > - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup > - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description > - add a separate adrp_movk function to to support targets located more than 4GB away > - Force the use of movk in combination with adrp and ldr instructions to address scenarios > where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp > - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: > _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. > Fix: use _oops_size int16 field to calculate metadata offset > - removing dead code > - a bit of cleanup and addressing review suggestions > - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup > - remove _code_end_offset > - ... and 4 more: https://git.openjdk.org/jdk/compare/3c9d64eb...56c0cc78 As agreed, I moved oops back to nmethod, significantly reducing the change. All AArch-specific modifications (encoding of long load with adrp+movk+ldr and its relocation patching) were reverted. Testing results: - Builds: AArch & x86, client build, GraalVM build - jtreg (hotspot & jdk tier1-3): G1/ZGC/Shenandoah/Xcomp/TieredStopAtLevel=3/-TieredCompilation - No regressions - DaCapo & Renaissance benchmarks - No regressions Here is the PrintNMethodStatistics printout. It shows that in the application (a large Renaissance Dotty benchmark), we observe a significant reduction in CodeCache usage. Statistics for 20625 bytecoded nmethods for C1: total size = 121587728 (100%) in CodeCache = 80406760 (66.130653%) header = 4950000 (6.156199%) constants = 640 (0.000796%) main code = 69890600 (86.921303%) stub code = 4923768 (6.123575%) oops = 476752 (0.592925%) mutable data = 10163920 (8.359330%) relocation = 6810824 (67.009811%) metadata = 3353096 (32.990185%) immutable data = 31017048 (25.510014%) dependencies = 606216 (1.954461%) nul chk table = 724344 (2.335309%) handler table = 222464 (0.717231%) scopes pcs = 15817888 (50.997398%) scopes data = 13646136 (43.995602%) Statistics for 8290 bytecoded nmethods for C2 | Statistics for 8442 bytecoded nmethods for JVMCI total size = 66679688 (100%) | total size = 46208136 (100%) in CodeCache = 26004920 (38.999763%) | in CodeCache = 19489616 (42.177887%) header = 1989600 (7.650860%) | header = 2026080 (10.395690%) constants = 1920 (0.007383%) | constants = 540288 (2.772184%) main code = 20949456 (80.559586%) | main code = 14737620 (75.617805%) stub code = 2702064 (10.390588%) | stub code = 1904548 (9.772117%) oops = 295560 (1.136554%) | oops = 213544 (1.095681%) mutable data = 6564928 (9.845469%) | mutable data = 4168848 (9.021892%) relocation = 3542736 (53.964584%) | relocation = 1671384 (40.092228%) > JVMCI data = 202608 (4.860048%) metadata = 3022192 (46.035416%) | metadata = 2294856 (55.047726%) immutable data = 34109840 (51.154766%) | immutable data = 22549672 (48.800220%) dependencies = 988000 (2.896525%) | dependencies = 460104 (2.040402%) nul chk table = 554680 (1.626158%) | nul chk table = 618888 (2.744554%) handler table = 1787424 (5.240201%) | handler table = 20664 (0.091638%) scopes pcs = 16152224 (47.353561%) | scopes pcs = 10965040 (48.626163%) scopes data = 14627512 (42.883556%) | scopes data = 7746888 (34.354771%) > speculations = 2738088 (12.142474%) By moving mutable data out of the CodeCache, we reduce CodeCache usage by the following percentages: - C1: 10163920/(10163920+80406760) = 11% - C2: 6564928/(6564928+26004920) = 20% - JMVTI: 4168848/(4168848+19489616) = 18% ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2695425491 PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2695429860 From duke at openjdk.org Mon Mar 3 21:06:13 2025 From: duke at openjdk.org (Matthias Ernst) Date: Mon, 3 Mar 2025 21:06:13 GMT Subject: RFR: 8346664: C2: Optimize mask check with constant offset [v24] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 07:18:52 GMT, Matthias Ernst wrote: >> Fixes [JDK-8346664](https://bugs.openjdk.org/browse/JDK-8346664): extends the optimization of masked sums introduced in #6697 to cover constant values, which currently break the optimization. >> >> Such constant values arise in an expression of the following form, for example from `MemorySegmentImpl#isAlignedForElement`: >> >> >> (base + (index + 1) << 8) & 255 >> => MulNode >> (base + (index << 8 + 256)) & 255 >> => AddNode >> ((base + index << 8) + 256) & 255 >> >> >> Currently, `256` is not being recognized as a shifted value. This PR enables further reduction: >> >> >> ((base + index << 8) + 256) & 255 >> => MulNode (this PR) >> (base + index << 8) & 255 >> => MulNode (PR #6697) >> base & 255 (loop invariant) >> >> >> Implementation notes: >> * I verified that the originating issue "scaled varhandle indexed with i+1" (https://mail.openjdk.org/pipermail/panama-dev/2024-December/020835.html) is resolved with this PR. >> * ~in order to stay with the flow of the current implementation, I refrained from solving general (const & mask)==0 cases, but only those where const == _ << shift.~ >> * ~I modified existing test cases adding/subtracting from the index var (which would fail with current C2). Let me know if would like to see separate cases for these.~ > > Matthias Ernst has updated the pull request incrementally with one additional commit since the last revision: > > incorporate @eme64's comment suggestions My proposed fix for this is in https://github.com/openjdk/jdk/pull/23871 . ------------- PR Comment: https://git.openjdk.org/jdk/pull/22856#issuecomment-2695528823 From dlong at openjdk.org Tue Mar 4 04:56:24 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 4 Mar 2025 04:56:24 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v5] In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. Dean Long has updated the pull request incrementally with two additional commits since the last revision: - fix typo - moved and hopefully improved invokedynamic comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23557/files - new: https://git.openjdk.org/jdk/pull/23557/files/375f6cfe..80a3235a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=03-04 Stats: 7 lines in 2 files changed: 5 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23557.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23557/head:pull/23557 PR: https://git.openjdk.org/jdk/pull/23557 From dlong at openjdk.org Tue Mar 4 04:56:24 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 4 Mar 2025 04:56:24 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v4] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Sat, 1 Mar 2025 22:20:23 GMT, Richard Reingruber wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> use new Bytecode_invoke::has_memeber_arg > > src/hotspot/share/runtime/vframeArray.cpp line 616: > >> 614: // invokedynamic instructions don't have a class but obviously don't have a MemberName appendix. >> 615: // NOTE: Use machinery here that avoids resolving of any kind. >> 616: const bool has_member_arg = inv.has_member_arg(); > > I reckon the comment about invokedynamic isn't needed anymore. It could be moved to has_member_arg if you want to keep it. Good idea. Done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1978611223 From dlunden at openjdk.org Tue Mar 4 07:09:34 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 4 Mar 2025 07:09:34 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v8] In-Reply-To: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: > When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. > > ### Changeset > > It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. > > To illustrate the idealization and how it resolves this issue, consider the example below. > > ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) > > `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. > > We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. > > The changeset consists of the following changes. > - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. > - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. > - Add multiple new regression tests in `TestGCMLoadPlacement.java`. > > For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/13394882532) > - `tier1` to `tier4` (an... Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into insert-anti-dependences-8333393+igvn+pr - Update missing copyright - Change to GrowableArray - Update after Christian's review - Fix subtle bug introduced in previous update - Update after review comments - Remove test that no longer reproduces the issue - First version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23691/files - new: https://git.openjdk.org/jdk/pull/23691/files/b2ab85b0..959d916d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23691&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23691&range=06-07 Stats: 33467 lines in 986 files changed: 14299 ins; 14776 del; 4392 mod Patch: https://git.openjdk.org/jdk/pull/23691.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23691/head:pull/23691 PR: https://git.openjdk.org/jdk/pull/23691 From epeter at openjdk.org Tue Mar 4 07:42:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 07:42:57 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint In-Reply-To: References: Message-ID: <0pWDuxg-bYEv4knSpiOrblIyxVkZtxDh-z_76s8ASh4=.a54ce93c-6913-4b20-9b1c-a03e927c644a@github.com> On Mon, 3 Mar 2025 18:45:52 GMT, Liam Miller-Cushon wrote: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. @cushon @mernst-github Thanks for working on the fix! Can you please add a description to the PR (and if possible also on JIRA) to explain what the issue is, and how you are fixing it? Is there a regression test that reproduces this reliably? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2696498361 From rcastanedalo at openjdk.org Tue Mar 4 07:45:58 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 4 Mar 2025 07:45:58 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v8] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: On Tue, 4 Mar 2025 07:09:34 GMT, Daniel Lund?n wrote: >> When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. >> >> ### Changeset >> >> It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. >> >> To illustrate the idealization and how it resolves this issue, consider the example below. >> >> ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) >> >> `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. >> >> We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. >> >> The changeset consists of the following changes. >> - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. >> - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. >> - Add multiple new regression tests in `TestGCMLoadPlacement.java`. >> >> For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/ac... > > Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into insert-anti-dependences-8333393+igvn+pr > - Update missing copyright > - Change to GrowableArray > - Update after Christian's review > - Fix subtle bug introduced in previous update > - Update after review comments > - Remove test that no longer reproduces the issue > - First version Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23691#pullrequestreview-2656308586 From epeter at openjdk.org Tue Mar 4 07:50:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 07:50:39 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears Message-ID: @rwestrel asked me for this here: https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. ------------- Commit messages: - fix assert for VerifyLoopOpts - whitespace - added test - improve assert - JDK-8350756 Changes: https://git.openjdk.org/jdk/pull/23865/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350756 Stats: 156 lines in 6 files changed: 153 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23865/head:pull/23865 PR: https://git.openjdk.org/jdk/pull/23865 From xgong at openjdk.org Tue Mar 4 08:02:59 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 4 Mar 2025 08:02:59 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: <17arBLP__LxvP0MS9liI_o9HTVpzxDJrjy3LjYNn8Ng=.8be17d07-09b0-4d31-b41f-66d3c9be5fad@github.com> References: <17arBLP__LxvP0MS9liI_o9HTVpzxDJrjy3LjYNn8Ng=.8be17d07-09b0-4d31-b41f-66d3c9be5fad@github.com> Message-ID: <2H_Ol6dl2XhWKowrqvJbdAEFoHWYNu65dR60bjkIaPQ=.879025d0-d0bc-49d1-94e3-da69666c372c@github.com> On Mon, 3 Mar 2025 09:44:37 GMT, Bhavana Kilambi wrote: >> The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. >> >> Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. >> >> This patch added the rearrange support for vector types with small lane count. Here are the main changes: >> - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) >> - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation >> - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one >> >> Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. >> >> 1) NEON >> >> JMH on panama-vector:vectorIntrinsics: >> >> Benchmark (size) Mode Cnt Units Before After Gain >> Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x >> Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x >> Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x >> Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x >> Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x >> Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x >> Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x >> Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x >> Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x >> Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x >> Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x >> Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.... > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2595: > >> 2593: // type B/S/I/L/F/D, and the offset between two types is 16; Hence >> 2594: // the offset for L is 48. >> 2595: lea(rscratch1, > > Hi @XiaohongGong , thanks for adding support for 2D/2L as well. I was trying to implement the same for the two vector table and I am wondering what you think of this implementation - > > negr(dst, shuffle); // this would help create a mask. If input is 1, it would be all 1s and all 0s if its 0 > dup(tmp1, src1, 0); // duplicate first element of src1 > dup(tmp2, src1, 1); // duplicate second element of src1 > bsl(dst, T16B, tmp2, tmp1); // Select from tmp2 if dst is 1 and from tmp1 if dst is 0 > > > > I am really not sure which implementation would be faster though. This implementation might take around 8 cycles. Hi @Bhavana-Kilambi , I'v finished the test with what you suggested on my Grace CPU. The vectorapi jtreg all pass. So this solution works well. But the performance seems no obvious change compared with the current PR's codegen as expected. Here is the performance data: Benchmark (size) Mode Cnt Current Bahavana's Units Gain Double128Vector.rearrange 1024 thrpt 30 591.504 588.616 ops/ms 0.995 Long128Vector.rearrange 1024 thrpt 30 593.348 590.802 ops/ms 0.995 SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 30 16576.713 16664.580 ops/ms 1.005 SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 30 8358.694 8392.733 ops/ms 1.004 SelectFromBenchmark.rearrangeFromDoubleVector 1024 thrpt 30 1312.752 1213.538 ops/ms 0.924 SelectFromBenchmark.rearrangeFromDoubleVector 2048 thrpt 30 657.365 607.060 ops/ms 0.923 SelectFromBenchmark.rearrangeFromFloatVector 1024 thrpt 30 1905.595 1911.831 ops/ms 1.003 SelectFromBenchmark.rearrangeFromFloatVector 2048 thrpt 30 952.205 957.160 ops/ms 1.005 SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 30 2106.763 2107.238 ops/ms 1.000 SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 30 1056.299 1056.769 ops/ms 1.000 SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 30 1462.355 1247.853 ops/ms 0.853 SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 30 732.559 616.753 ops/ms 0.841 SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 30 4560.253 4559.861 ops/ms 0.999 SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 30 2279.058 2279.693 ops/ms 1.000 VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 30 1080.589 1073.883 ops/ms 0.993 VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 30 541.629 537.288 ops/ms 0.991 VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 30 269.886 268.460 ops/ms 0.994 VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 30 135.193 134.175 ops/ms 0.992 I expected it will have obvious improvement since we do not need the heavy `ldr` instruction. But I also got the similar performance data on an AArch64 n1 machine. One shortage of your suggestion I can see is it needs one more temp vector register. To be honest, I'm not sure which one is better. Maybe we need more performance data on different kinds of AArch64 machines. So, would you mind testing the performance on other AArch64 machines with NEON? Thanks a lot! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1978832690 From epeter at openjdk.org Tue Mar 4 08:15:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 08:15:03 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v10] In-Reply-To: <6P25Yy-0rkWudVp20tNwD1bWeozNUD0UoPdDlJIN7wc=.b07e7461-7af0-4fab-aa8b-a737b0b40591@github.com> References: <6P25Yy-0rkWudVp20tNwD1bWeozNUD0UoPdDlJIN7wc=.b07e7461-7af0-4fab-aa8b-a737b0b40591@github.com> Message-ID: On Thu, 27 Feb 2025 16:57:27 GMT, Roland Westrelin wrote: >> This change refactors `RShiftI`/`RshiftL` `Ideal`, `Identity` and >> `Value` because the `int` and `long` versions are very similar and so >> there's no logic duplication. In the process, support for some extra >> transformations is added to `RShiftL`. I also added some new test >> cases. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8349361 > - review > - review > - review > - Merge branch 'master' into JDK-8349361 > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Emanuel Peter > - review > - Update src/hotspot/share/opto/mulnode.hpp > > Co-authored-by: Jasmine Karthikeyan <25208576+jaskarth at users.noreply.github.com> > - ... and 7 more: https://git.openjdk.org/jdk/compare/119a35df...d3b1cf08 @rwestrel Thanks for moving code so it is easier to review ? I have a few more minor suggestions, but on the whole I'm happy with it :) I'd still like to run another round of testing once we have it all finished up though, so please ping me again once you worked through my comments ;) src/hotspot/share/opto/mulnode.cpp line 1401: > 1399: const Node* and_node = in(1); > 1400: if (and_node->Opcode() == Op_And(bt) && > 1401: (mask_t = phase->type(and_node->in(2))->isa_integer(bt)) && I think this is an implicit nullptr check, right? It's not allowed according to OpenJDK style guide, but since it was here already I leave it to you if you want to fix it. src/hotspot/share/opto/mulnode.cpp line 1431: > 1429: if (shift == 16 && > 1430: (left_shift_t = phase->type(shl->in(2))->isa_int()) && > 1431: left_shift_t->is_con(16)) { Suggestion: const TypeInt* left_shift_t = phase->type(shl->in(2))->isa_int(); if (shift == 16 && left_shift_t != nullptr && left_shift_t->is_con(16)) { Would that be equivalent? I think it would be more readable. src/hotspot/share/opto/mulnode.cpp line 1452: > 1450: // Check for "(byte[i] <<24)>>24" which simply sign-extends > 1451: if (shift == 24 && > 1452: (left_shift_t = phase->type(shl->in(2))->isa_int()) && Then you could also refactor down here. Because these are implicit nullptr checks again. src/hotspot/share/opto/mulnode.hpp line 325: > 323: class RShiftNode : public Node { > 324: public: > 325: RShiftNode(Node* in1, Node* in2) : Node(nullptr,in1,in2) {} Suggestion: RShiftNode(Node* in1, Node* in2) : Node(nullptr, in1, in2) {} src/hotspot/share/opto/mulnode.hpp line 336: > 334: class RShiftINode : public RShiftNode { > 335: public: > 336: RShiftINode(Node* in1, Node* in2) : RShiftNode(in1,in2) {} Suggestion: RShiftINode(Node* in1, Node* in2) : RShiftNode(in1, in2) {} src/hotspot/share/opto/mulnode.hpp line 351: > 349: class RShiftLNode : public RShiftNode { > 350: public: > 351: RShiftLNode(Node* in1, Node* in2) : RShiftNode(in1,in2) {} Suggestion: RShiftLNode(Node* in1, Node* in2) : RShiftNode(in1, in2) {} test/hotspot/jtreg/compiler/c2/irTests/RShiftINodeIdealizationTests.java line 40: > 38: } > 39: > 40: @Run(test = { "test1", "test2", "test3", "test4", "test5", "test6", "test7", "test8", "test9", "test10" }) For completeness sake, you should add the bug number above too ;) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23438#pullrequestreview-2656343579 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1978828495 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1978834660 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1978836066 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1978846142 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1978846474 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1978846725 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1978849191 From epeter at openjdk.org Tue Mar 4 08:16:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 08:16:56 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 13:27:00 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review @rwestrel @galderz Are you two still working on this or is it ready for someone else to review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2696582606 From epeter at openjdk.org Tue Mar 4 08:18:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 08:18:58 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v8] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: On Tue, 4 Mar 2025 07:09:34 GMT, Daniel Lund?n wrote: >> When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. >> >> ### Changeset >> >> It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. >> >> To illustrate the idealization and how it resolves this issue, consider the example below. >> >> ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) >> >> `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. >> >> We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. >> >> The changeset consists of the following changes. >> - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. >> - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. >> - Add multiple new regression tests in `TestGCMLoadPlacement.java`. >> >> For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/ac... > > Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into insert-anti-dependences-8333393+igvn+pr > - Update missing copyright > - Change to GrowableArray > - Update after Christian's review > - Fix subtle bug introduced in previous update > - Update after review comments > - Remove test that no longer reproduces the issue > - First version Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23691#pullrequestreview-2656395563 From rrich at openjdk.org Tue Mar 4 08:20:57 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 4 Mar 2025 08:20:57 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v5] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Tue, 4 Mar 2025 04:56:24 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with two additional commits since the last revision: > > - fix typo > - moved and hopefully improved invokedynamic comment Marked as reviewed by rrich (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23557#pullrequestreview-2656399880 From epeter at openjdk.org Tue Mar 4 08:31:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 08:31:56 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v5] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 18:00:14 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > remove tri-conditionals This looks interesting! Thanks @tabjy for the work! For now, I just have some drive-through comments about testing. Also: would it make sense to have a JMH benchmark to prove that this code change is beneficial enough to warrant the additional complexity? test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 35: > 33: /* > 34: * @test > 35: * @bug 8325495 Should we adjust / add the bug number? test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 291: > 289: return i + (x << i) + i; // Expects 64 + 63 + 64 = 191 > 290: } > 291: } Would it make sense to add some randomized patterns, just for result verification? You can use `Generators.java` to get interesting values. Of course that would mean not doing IR verification, but at least it would give us better confidence that the values are correct. I'm imagining expressions like this: `return a * CON1 + a * CON2 + a * CON3 + a * CON4` Where the CON are defined as a `public static final` field with a random value generated by `Generators`. The advantage of using `Generators` is that it generates powers-of-two more frequently, which seems to be relevant here. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23506#pullrequestreview-2656414102 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1978870944 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1978880629 From bkilambi at openjdk.org Tue Mar 4 08:40:53 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 4 Mar 2025 08:40:53 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: <2H_Ol6dl2XhWKowrqvJbdAEFoHWYNu65dR60bjkIaPQ=.879025d0-d0bc-49d1-94e3-da69666c372c@github.com> References: <17arBLP__LxvP0MS9liI_o9HTVpzxDJrjy3LjYNn8Ng=.8be17d07-09b0-4d31-b41f-66d3c9be5fad@github.com> <2H_Ol6dl2XhWKowrqvJbdAEFoHWYNu65dR60bjkIaPQ=.879025d0-d0bc-49d1-94e3-da69666c372c@github.com> Message-ID: On Tue, 4 Mar 2025 08:00:24 GMT, Xiaohong Gong wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2595: >> >>> 2593: // type B/S/I/L/F/D, and the offset between two types is 16; Hence >>> 2594: // the offset for L is 48. >>> 2595: lea(rscratch1, >> >> Hi @XiaohongGong , thanks for adding support for 2D/2L as well. I was trying to implement the same for the two vector table and I am wondering what you think of this implementation - >> >> negr(dst, shuffle); // this would help create a mask. If input is 1, it would be all 1s and all 0s if its 0 >> dup(tmp1, src1, 0); // duplicate first element of src1 >> dup(tmp2, src1, 1); // duplicate second element of src1 >> bsl(dst, T16B, tmp2, tmp1); // Select from tmp2 if dst is 1 and from tmp1 if dst is 0 >> >> >> >> I am really not sure which implementation would be faster though. This implementation might take around 8 cycles. > > Hi @Bhavana-Kilambi , I'v finished the test with what you suggested on my Grace CPU. The vectorapi jtreg all pass. So this solution works well. But the performance seems no obvious change compared with the current PR's codegen as expected. > > Here is the performance data: > > Benchmark (size) Mode Cnt Current Bahavana's Units Gain > Double128Vector.rearrange 1024 thrpt 30 591.504 588.616 ops/ms 0.995 > Long128Vector.rearrange 1024 thrpt 30 593.348 590.802 ops/ms 0.995 > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 30 16576.713 16664.580 ops/ms 1.005 > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 30 8358.694 8392.733 ops/ms 1.004 > SelectFromBenchmark.rearrangeFromDoubleVector 1024 thrpt 30 1312.752 1213.538 ops/ms 0.924 > SelectFromBenchmark.rearrangeFromDoubleVector 2048 thrpt 30 657.365 607.060 ops/ms 0.923 > SelectFromBenchmark.rearrangeFromFloatVector 1024 thrpt 30 1905.595 1911.831 ops/ms 1.003 > SelectFromBenchmark.rearrangeFromFloatVector 2048 thrpt 30 952.205 957.160 ops/ms 1.005 > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 30 2106.763 2107.238 ops/ms 1.000 > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 30 1056.299 1056.769 ops/ms 1.000 > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 30 1462.355 1247.853 ops/ms 0.853 > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 30 732.559 616.753 ops/ms 0.841 > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 30 4560.253 4559.861 ops/ms 0.999 > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 30 2279.058 2279.693 ops/ms 1.000 > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 30 1080.589 1073.883 ops/ms 0.993 > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 30 541.629 537.288 ops/ms 0.991 > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 30 269.886 268.460 ops/ms 0.994 > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 30 135.193 134.175 ops/ms 0.992 > > > I expected it will have obvious improvement since we do not need the heavy `ldr` instruction. But I also got the similar performance data on an AArch64 n1 machine. One shortage of your suggestion I can see is it needs one more temp vector register. To be honest, I'm not sure which one i... Hi @XiaohongGong , thanks for testing this variation. I also expected it to have relatively better performance due to the absence of the load instruction. Maybe it might help in larger real-world workload where reducing some load instructions or having fewer instructions can help performance (by reducing pressure on icache/iTLB). Thinking of aarch64 Neon machines that we can test this on - we have only N1, V2 (Grace) machines which have support for 128-bit Neon. V1 is 256 bit Neon/SVE which will execute the `sve tbl` instruction instead. I can of course disable SVE and run the Neon instructions on V1 but I don't think that would really make any difference. So for 128-bit Neon machines, I can also test only on N1 and V2 which you've already done. Do you have a specific machine in mind that you'd like this to be tested on? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1978898324 From chagedorn at openjdk.org Tue Mar 4 08:51:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 4 Mar 2025 08:51:59 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears In-Reply-To: References: Message-ID: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> On Mon, 3 Mar 2025 13:38:55 GMT, Emanuel Peter wrote: > @rwestrel asked me for this here: > https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 > > The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. > > It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. A few comments, otherwise the fix idea looks good to me! src/hotspot/share/opto/loopnode.cpp line 2743: > 2741: IfTrueNode* before_predicates = predicates.entry()->isa_IfTrue(); > 2742: if (before_predicates != nullptr && > 2743: before_predicates->in(0)->is_If() && Since this is after IGVN and we have not applied any loop transformations in this round, yet, shouldn't this always hold since we check that we have a `IfTrue` as child? src/hotspot/share/opto/loopnode.cpp line 6653: > 6651: if (!_verify_only && n->Opcode() == Op_OpaqueMultiversioning) { > 6652: _multiversion_opaque_nodes.push(n); > 6653: } I'm wondering if we should do it here or as a list in `Compile`, similar to what we do with Parse and Template Assertion Predicates: https://github.com/openjdk/jdk/blob/1f10ffba88119caab169b1fc43ccfd143e3b85a6/src/hotspot/share/opto/compile.hpp#L372-L374 Doing it in `PhaseIdealLoop` has the advantage that we limit the lifetime of the list whereas in `Compile` it's live until the entire compilation is done - which is a little too long. The disadvantage with having the list in `PhaseIdealLoop` is that you reallocate and reinitialize the list over and over again. So, I'm not entirely clear which one is better. Since we also do it like that for zero trip guards, we can also move forward with this and revisit it again later if we decide that we should move it to `Compile`. While thinking about this, it would have been great to have a class `LoopOpts` where we can store all such things that should be live over multiple loop opts pass but which is then not used anymore. Anyway, that's definitely out of scope. src/hotspot/share/opto/loopopts.cpp line 794: > 792: return nullptr; > 793: } > 794: if (bol->Opcode() != Op_Bool) { Could we use `!bol->is_Bool()` here? src/hotspot/share/opto/loopopts.cpp line 795: > 793: } > 794: if (bol->Opcode() != Op_Bool) { > 795: assert(false, "Expected Bool, but got %s", NodeClassNames[bol->Opcode()]); Good idea to print the opcode! src/hotspot/share/opto/opaquenode.hpp line 104: > 102: private: > 103: bool _is_delayed_slow_loop; > 104: bool _is_useful; I suggest to flip it to `_uselss` to be in line with what we have in `ParsePredicateNode`: https://github.com/openjdk/jdk/blob/1f10ffba88119caab169b1fc43ccfd143e3b85a6/src/hotspot/share/opto/cfgnode.hpp#L487-L489 Then we could also add a `dump_spec()` method that print `#useless` if it becomes useless (similar to what we have for `ParsePredicateNode`): https://github.com/openjdk/jdk/blob/1f10ffba88119caab169b1fc43ccfd143e3b85a6/src/hotspot/share/opto/ifnode.cpp#L2227-L2229 This is also useful to check in IGV if such an opaque node is useless or not. Side note: I'm in the process of having such a `_useless` flag for `OpaqueTemplateAssertionPredicate` nodes as well instead of directly replacing it with a constant. This seems cleaner and does not interfere with pattern matching. src/hotspot/share/opto/opaquenode.hpp line 121: > 119: } > 120: > 121: void set_useless() { I suggest `mark_useless()` to be in line with `ParsePredicateNode`. Suggestion: void mark_useless() { test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java line 38: > 36: */ > 37: > 38: public class TestMultiversionRemoveUselessSlowLoop { Is it also possible to add an IR test where we can check that a `OpaqueMultiversioning` node is first in the graph and then removed? Maybe we can check that in loop opts phase n, `COUNTED_LOOP_MAIN` and `OpaqueMultiversioning` is present. Then in loop opts phase n + 1, `COUNTED_LOOP_MAIN` is removed and in n + 2, `OpaqueMultiversioning` is removed as well. But I'm not sure if this is reliable enough when some loop opts change - though could easily be fixed or dropped again if the test fails at some point. ------------- PR Review: https://git.openjdk.org/jdk/pull/23865#pullrequestreview-2656333142 PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1978827019 PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1978904536 PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1978825753 PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1978874732 PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1978822824 PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1978871295 PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1978887601 From epeter at openjdk.org Tue Mar 4 09:23:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 09:23:23 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v2] In-Reply-To: References: Message-ID: > @rwestrel asked me for this here: > https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 > > The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. > > It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: for Christian v1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23865/files - new: https://git.openjdk.org/jdk/pull/23865/files/5ba11a71..50f502ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=00-01 Stats: 17 lines in 3 files changed: 10 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23865/head:pull/23865 PR: https://git.openjdk.org/jdk/pull/23865 From dnsimon at openjdk.org Tue Mar 4 09:23:08 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 4 Mar 2025 09:23:08 GMT Subject: RFR: 8351036: [JVMCI] value not an s2: -32776 Message-ID: This PR adds support for JVMCI to install code that requires stack slots whose offset > `Short.MAX_VALUE`. ------------- Commit messages: - support stack slots with an offset > Short.MAX_VALUE Changes: https://git.openjdk.org/jdk/pull/23888/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23888&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351036 Stats: 44 lines in 4 files changed: 36 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23888.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23888/head:pull/23888 PR: https://git.openjdk.org/jdk/pull/23888 From epeter at openjdk.org Tue Mar 4 09:23:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 09:23:23 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v2] In-Reply-To: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: <_smFw90Hxj94ssULI-ATwRfFhAPRrpYXF3XoARz2CW0=.87c423fb-2db5-4d2e-9d6e-270d32b8b64b@github.com> On Tue, 4 Mar 2025 07:53:04 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> for Christian v1 > > src/hotspot/share/opto/opaquenode.hpp line 104: > >> 102: private: >> 103: bool _is_delayed_slow_loop; >> 104: bool _is_useful; > > I suggest to flip it to `_uselss` to be in line with what we have in `ParsePredicateNode`: > https://github.com/openjdk/jdk/blob/1f10ffba88119caab169b1fc43ccfd143e3b85a6/src/hotspot/share/opto/cfgnode.hpp#L487-L489 > > Then we could also add a `dump_spec()` method that print `#useless` if it becomes useless (similar to what we have for `ParsePredicateNode`): > https://github.com/openjdk/jdk/blob/1f10ffba88119caab169b1fc43ccfd143e3b85a6/src/hotspot/share/opto/ifnode.cpp#L2227-L2229 > > This is also useful to check in IGV if such an opaque node is useless or not. > > Side note: I'm in the process of having such a `_useless` flag for `OpaqueTemplateAssertionPredicate` nodes as well instead of directly replacing it with a constant. This seems cleaner and does not interfere with pattern matching. Refactored `_is_useful` -> `_uselss`. Added `dump_spec`. > src/hotspot/share/opto/opaquenode.hpp line 121: > >> 119: } >> 120: >> 121: void set_useless() { > > I suggest `mark_useless()` to be in line with `ParsePredicateNode`. > Suggestion: > > void mark_useless() { Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1978980527 PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1978981017 From epeter at openjdk.org Tue Mar 4 09:28:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 09:28:57 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v2] In-Reply-To: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 07:56:03 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> for Christian v1 > > src/hotspot/share/opto/loopnode.cpp line 2743: > >> 2741: IfTrueNode* before_predicates = predicates.entry()->isa_IfTrue(); >> 2742: if (before_predicates != nullptr && >> 2743: before_predicates->in(0)->is_If() && > > Since this is after IGVN and we have not applied any loop transformations in this round, yet, shouldn't this always hold since we check that we have a `IfTrue` as child? You are probably right. But the method `CountedLoopNode::find_multiversion_if_from_multiversion_fast_main_loop` could be used from other contexts later. So I'd rather not make such assumptions. What do you think? > src/hotspot/share/opto/loopopts.cpp line 794: > >> 792: return nullptr; >> 793: } >> 794: if (bol->Opcode() != Op_Bool) { > > Could we use `!bol->is_Bool()` here? It would be equivalent now, since nobody inherits from `BoolNode`. But if someone ever inherits from it, then this condition would be weaker. Also: I just copied the condition. I can do it if you still want me to do it, I'm on the fence with this one ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1978994303 PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1978989179 From yzheng at openjdk.org Tue Mar 4 09:30:53 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 4 Mar 2025 09:30:53 GMT Subject: RFR: 8351036: [JVMCI] value not an s2: -32776 In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:18:40 GMT, Doug Simon wrote: > This PR adds support for JVMCI to install code that requires stack slots whose offset > `Short.MAX_VALUE`. Marked as reviewed by yzheng (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23888#pullrequestreview-2656634837 From epeter at openjdk.org Tue Mar 4 09:35:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 09:35:53 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v2] In-Reply-To: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 08:31:14 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> for Christian v1 > > test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java line 38: > >> 36: */ >> 37: >> 38: public class TestMultiversionRemoveUselessSlowLoop { > > Is it also possible to add an IR test where we can check that a `OpaqueMultiversioning` node is first in the graph and then removed? Maybe we can check that in loop opts phase n, `COUNTED_LOOP_MAIN` and `OpaqueMultiversioning` is present. Then in loop opts phase n + 1, `COUNTED_LOOP_MAIN` is removed and in n + 2, `OpaqueMultiversioning` is removed as well. But I'm not sure if this is reliable enough when some loop opts change - though could easily be fixed or dropped again if the test fails at some point. I thought about it. I thought it would be quite hard to get a good test for this. Especially because it depends on the flags. But maybe I can create a test that has just the right flags, make it flagless, somehow engineer that the fast main loop decays in some specific loop-opts phase, and then in the next phase things must be cleaned up. The issue is: in the current example the slow_loop already is removed anyway. But it would be nice to have a case where after it all the fast pre loop would still be around, or the fast post loop... hmm. Maybe I can do it with fully unrolling the fast main loop, and the slow loop does not disappear because of it. Really tricky. Do you have a good idea @chhagedorn ? How much time would you like me to invest in this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979004787 From epeter at openjdk.org Tue Mar 4 09:40:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 09:40:52 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v2] In-Reply-To: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 08:41:46 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> for Christian v1 > > src/hotspot/share/opto/loopnode.cpp line 6653: > >> 6651: if (!_verify_only && n->Opcode() == Op_OpaqueMultiversioning) { >> 6652: _multiversion_opaque_nodes.push(n); >> 6653: } > > I'm wondering if we should do it here or as a list in `Compile`, similar to what we do with Parse and Template Assertion Predicates: > https://github.com/openjdk/jdk/blob/1f10ffba88119caab169b1fc43ccfd143e3b85a6/src/hotspot/share/opto/compile.hpp#L372-L374 > > Doing it in `PhaseIdealLoop` has the advantage that we limit the lifetime of the list whereas in `Compile` it's live until the entire compilation is done - which is a little too long. The disadvantage with having the list in `PhaseIdealLoop` is that you reallocate and reinitialize the list over and over again. So, I'm not entirely clear which one is better. > > Since we also do it like that for zero trip guards, we can also move forward with this and revisit it again later if we decide that we should move it to `Compile`. > > While thinking about this, it would have been great to have a class `LoopOpts` where we can store all such things that should be live over multiple loop opts pass but which is then not used anymore. Anyway, that's definitely out of scope. Yes, I just worked from `_zero_trip_guard_opaque_nodes` / `PhaseIdealLoop::eliminate_useless_zero_trip_guard` as I was suggested. I don't think that the life-time makes such a big difference, the array is quite small. Also: would we ever remove elements from that list if it is in `Compile`? Or would we just keep adding things, and iterate over more and more elements, even those that are long removed from the graph? If there is no clear reason to change the code, I'd prefer to leave it as is ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979016556 From chagedorn at openjdk.org Tue Mar 4 10:02:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 4 Mar 2025 10:02:01 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v2] In-Reply-To: References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 09:26:39 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopnode.cpp line 2743: >> >>> 2741: IfTrueNode* before_predicates = predicates.entry()->isa_IfTrue(); >>> 2742: if (before_predicates != nullptr && >>> 2743: before_predicates->in(0)->is_If() && >> >> Since this is after IGVN and we have not applied any loop transformations in this round, yet, shouldn't this always hold since we check that we have a `IfTrue` as child? > > You are probably right. But the method `CountedLoopNode::find_multiversion_if_from_multiversion_fast_main_loop` could be used from other contexts later. So I'd rather not make such assumptions. What do you think? I don't think that the input of an `IfTrue` is anything else than an `IfNode` (or a subclass) while being outside of IGVN. Otherwise, the graph is broken. Moreover, if we decide to call this method from IGVN at some point, we face another problem that predicate iteration is not safe during IGVN - and we probably do not want to make it safe if there is not a very strong reason for it. >> src/hotspot/share/opto/loopopts.cpp line 794: >> >>> 792: return nullptr; >>> 793: } >>> 794: if (bol->Opcode() != Op_Bool) { >> >> Could we use `!bol->is_Bool()` here? > > It would be equivalent now, since nobody inherits from `BoolNode`. But if someone ever inherits from it, then this condition would be weaker. Also: I just copied the condition. > > I can do it if you still want me to do it, I'm on the fence with this one ;) I'm not sure if we were ever to extend `BoolNode`. And if so, I think implementing this requires to check all the `is_Bool()` conditions in our code base anyways - which I expect we have quite a lot of them. And it could be that a subclass would also allow to apply a cmove transformation. So, given that it's unlikely, I would be more inclined to change it to `is_Bool()`. But it's your call. I'm fine with both :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979062802 PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979061916 From chagedorn at openjdk.org Tue Mar 4 10:07:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 4 Mar 2025 10:07:53 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v2] In-Reply-To: References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 09:38:34 GMT, Emanuel Peter wrote: > I don't think that the life-time makes such a big difference, the array is quite small. I agree with that. > Also: would we ever remove elements from that list if it is in Compile? Or would we just keep adding things, and iterate over more and more elements, even those that are long removed from the graph? There is code that removes elements again from the list when calling `Node::destruct()` and `Compile::remove_useless_node()`: https://github.com/openjdk/jdk/blob/1f10ffba88119caab169b1fc43ccfd143e3b85a6/src/hotspot/share/opto/node.cpp#L612-L614 https://github.com/openjdk/jdk/blob/1f10ffba88119caab169b1fc43ccfd143e3b85a6/src/hotspot/share/opto/compile.cpp#L399-L401 > If there is no clear reason to change the code, I'd prefer to leave it as is ? I'm fine with both. But I think at some point we should decide to either go with one or the other solution and not have two ways of solving a similar problem :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979075056 From epeter at openjdk.org Tue Mar 4 10:24:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 10:24:53 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v2] In-Reply-To: References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 10:05:03 GMT, Christian Hagedorn wrote: > I'm fine with both. But I think at some point we should decide to either go with one or the other solution and not have two ways of solving a similar problem :-) Right, I agree we should unify the code. >There is code that removes elements again from the list when calling Node::destruct() and Compile::remove_useless_node(): That's really a more complicated solution. Probably this is the best solution, which you mentioned earlier: >While thinking about this, it would have been great to have a class LoopOpts where we can store all such things that should be live over multiple loop opts pass but which is then not used anymore. Anyway, that's definitely out of scope. Then we can just clear the relevant arrays on start of each loop-opts phase. That way we do not have to remove individual nodes, and we can still reuse the allocated memory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979106041 From chagedorn at openjdk.org Tue Mar 4 10:53:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 4 Mar 2025 10:53:52 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v2] In-Reply-To: References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 10:21:59 GMT, Emanuel Peter wrote: > Then we can just clear the relevant arrays on start of each loop-opts phase. That way we do not have to remove individual nodes, and we can still reuse the allocated memory. Yes, that would be ideal. Might be worth to further discuss this and what the impact would be. Anyway, I don't think we need to settle this now. Since zero trip guard opaque nodes already introduced a second way of doing it and there is no clear better way, I think we can use one or the other solution (your call which one) for now and visit this later again to unify the solution - hopefully with a new LoopOpts class :-) Is it worth to file an RFE now to keep track of that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979158705 From adinn at openjdk.org Tue Mar 4 11:42:03 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 4 Mar 2025 11:42:03 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs In-Reply-To: <4YAo_iY0X-5pGaf2xTi2Wk3rNPz0R7XRdAhZLmb2zR4=.4f56db7e-f9a1-4974-9f98-6c877f24b2cf@github.com> References: <4YAo_iY0X-5pGaf2xTi2Wk3rNPz0R7XRdAhZLmb2zR4=.4f56db7e-f9a1-4974-9f98-6c877f24b2cf@github.com> Message-ID: On Mon, 3 Mar 2025 15:51:42 GMT, Vladimir Kozlov wrote: >> @vnkozlov >>> It could be something changed in code generation but StubRoutines (finalstubs) is not helpful. >> >> Agreed. Although, I don't think that relates to this change. I am investigating. > >> Agreed. Although, I don't think that relates to this change. I am investigating. > > Thank you. To be clear, it is not for these changes. I just thought you may know something since you are touching this code. @vnkozlov I found the reason for the change in what gets printed and it is a bit ugly. It relates to [JDK-8320272: Make method_entry_barrier address shared](https://bugs.openjdk.org/browse/JDK-8320272) which promoted the method_entry_barrier stub up into shared code. It included this change in the AArch64 barrier set: if (slow_path == nullptr) { - __ movptr(rscratch1, (uintptr_t) StubRoutines::aarch64::method_entry_barrier()); + __ lea(rscratch1, RuntimeAddress(StubRoutines::method_entry_barrier())); __ blr(rscratch1); __ b(skip_barrier); So, the disassembler now prints details of that address via `nmethod::reloc_string_for(u_char* begin, u_char* end)` which does this case relocInfo::runtime_call_type: case relocInfo::runtime_call_w_cp_type: { stringStream st; st.print("runtime_call"); CallRelocation* r = (CallRelocation*)iter.reloc(); address dest = r->destination(); CodeBlob* cb = CodeCache::find_blob(dest); if (cb != nullptr) { st.print(" %s", cb->name()); } else { ResourceMark rm; const int buflen = 1024; char* buf = NEW_RESOURCE_ARRAY(char, buflen); . . . i.e. it appends the blob name "StubRoutines (final_stubs)". My cleanup did not change that behaviour. I'm not sure which jdk23 release you are using and how it gets the result you mentioned above. I built jdk-23-ga and it also appended the blob name when I printed an nmethod. I also tried the latest jdk21 (Red_Hat-21.0.6.0.7-1). It did not print the runtime_call annotation, instead just showing the decimal value of the first mov constant 0x0000ffff5aacbc50: mov x8, #0x9980 // #39296 0x0000ffff5aacbc54: movk x8, #0x5a57, lsl #16 0x0000ffff5aacbc58: movk x8, #0xffff, lsl #32 0x0000ffff5aacbc5c: blr x8 I think the "Stub::stub_name" format printout must occur in some intermediate version of the disassembler prior to the fix I mentioned above where the mov;movk;movk;br sequence is recognized as a call to a known address, as encoded by the three mov insns. There is a routine in the disassembler which adds a post-comment in that format when an address is recognized as belonging to a runtime blob. I think we probably need to patch the disassembler reloc processing so that it recognises the four multi-stub (i.e. stub code generator) blobs as a special case and rather than just appending the blob name instead identifies which embedded stub code range the target address belongs to and prints that specific stub's name. I'll raise a separate JIRA for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23829#issuecomment-2697237725 From mli at openjdk.org Tue Mar 4 12:08:00 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 4 Mar 2025 12:08:00 GMT Subject: [jdk24] RFR: 8351033: RISC-V: TestFloat16ScalarOperations asserts with offset (4210) is too large to be patched in one beq/bge/bgeu/blt/bltu/bne instruction! Message-ID: <7jdfTeNDeZGqy8ALh_PSz4xzkrxDKV9td0JrjYMaYhI=.22d91fe9-b94c-49da-8a44-501b2d7d39c6@github.com> 8351033: RISC-V: TestFloat16ScalarOperations asserts with offset (4210) is too large to be patched in one beq/bge/bgeu/blt/bltu/bne instruction! ------------- Commit messages: - Backport 79880e56375a1c17ec6ad29bb0ab01868bc956ff Changes: https://git.openjdk.org/jdk/pull/23894/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23894&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351033 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23894.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23894/head:pull/23894 PR: https://git.openjdk.org/jdk/pull/23894 From jbhateja at openjdk.org Tue Mar 4 12:14:27 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 4 Mar 2025 12:14:27 GMT Subject: RFR: 8351158: Incorrect APX EGPR register save ordering Message-ID: Currently, EGPR register save ordering[1] does not comply with the precomputed stack offsets[2]. This leads to incorrect register value reconstruction and various runtime clients using callee's RegisterMap like GC root set enumeration, de-optimization object reconstruction experience assertion failures due to unrecognizable oop pointer locations. This issue was discovered during our internal validation of SPECjvm2008 worklets with -XX:+UseAPX runtime flag using Intel SDE tool. Quick note on polling SafePoints :- SafePointNode at polling sites like method return or [outer] loop latches are different from the ones associated with Call sites as we do not spill caller saved registers before them, hence runtime handling for rootset enumeration to detect the oop pointer addresses in last activation solely relies on the RegisterMap populated by reading the RegisterSaver stack dumps. Kindly review and share your feedback. Best Regards, Jatin [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L268 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L105 ------------- Commit messages: - 8351158: Incorrect APX EGPR register save ordering Changes: https://git.openjdk.org/jdk/pull/23895/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23895&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351158 Stats: 31 lines in 1 file changed: 13 ins; 14 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23895/head:pull/23895 PR: https://git.openjdk.org/jdk/pull/23895 From mli at openjdk.org Tue Mar 4 12:20:04 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 4 Mar 2025 12:20:04 GMT Subject: [jdk24] Withdrawn: 8351033: RISC-V: TestFloat16ScalarOperations asserts with offset (4210) is too large to be patched in one beq/bge/bgeu/blt/bltu/bne instruction! In-Reply-To: <7jdfTeNDeZGqy8ALh_PSz4xzkrxDKV9td0JrjYMaYhI=.22d91fe9-b94c-49da-8a44-501b2d7d39c6@github.com> References: <7jdfTeNDeZGqy8ALh_PSz4xzkrxDKV9td0JrjYMaYhI=.22d91fe9-b94c-49da-8a44-501b2d7d39c6@github.com> Message-ID: On Tue, 4 Mar 2025 12:03:09 GMT, Hamlin Li wrote: > 8351033: RISC-V: TestFloat16ScalarOperations asserts with offset (4210) is too large to be patched in one beq/bge/bgeu/blt/bltu/bne instruction! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23894 From adinn at openjdk.org Tue Mar 4 12:21:57 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 4 Mar 2025 12:21:57 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs [v2] In-Reply-To: <3slACYq-NWB6drMXvVSvd10poQExmhZ-yGjSzoM01T8=.e0e01c24-6a30-480a-9331-8d04b3de8913@github.com> References: <3slACYq-NWB6drMXvVSvd10poQExmhZ-yGjSzoM01T8=.e0e01c24-6a30-480a-9331-8d04b3de8913@github.com> Message-ID: <7s8UpI4RbbmsDcq1bA7HyBpTkBoqxeNW6AjvqrXKmWA=.408853db-d0f4-419b-ba93-473ab9e2a352@github.com> On Mon, 3 Mar 2025 18:25:45 GMT, Aleksey Shipilev wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> correct error in arm stub name > >> > Hm. It looks like in release builds all of these would be "runtime stub"? Is that expected? >> We could perhaps fix it to say something more useful. Maybe in a separate PR in case something depends on it having that value in release builds? > > I am confused that we have `OptoRuntime::stub_id(...)`, and then we also have newly added by you: > > > runtime.hpp: > > // Returns the name associated with a given stub id > static const char* stub_name(OptoStubId id) { > assert(id > OptoStubId::NO_STUBID && id < OptoStubId::NUM_STUBIDS, "stub id out of range"); > return _stub_names[(int)id]; > } > > > Maybe `OptoRuntime::stub_name` should be calling that one directly, instead of going all the way through `CodeCache::find_blob`? Then we don't need any debug-defines there. @shipilev > I am confused that we have OptoRuntime::stub_id(...), and then we also have newly added by you . . . I'm not sure what you mean here. I think you are referring to `OptoRuntime::stub_name(address entry)` and contrasting it with `OptoRuntime::stub_name(OptoStubId id)`? Is that right? If so ... The first method is currently used in a couple of places in the C2 compiler to label calls that 1) employ a constant target address and 2) are expected to target one of the opto runtime stub (which also means a specific opto runtime blob since runtime blobs and stubs are 1-1). The stub id associated with the call is probably known -- or, if not, could certainly be coded explicitly -- some way up the call chain, sometimes in a direct caller, sometimes indirectly via a common helper (like the maths helpers). No matter high up the chain you have to go something is looking up that address using an explicit named accessor associated with an explicit stub id. However, as things are currently coded the direct calls to that first method only know the call target address. The stub id is not available at the point of call and propagating a stub id or stub name down would require some refactoring. Indeed, the current mess actually involves propagating down an extra, hard-wired name (char*) that is used alongside the string returned by `stub_name(address)`. So, yes, I agree that all really ought to be sorted out, allowing this method to be retired -- but not in this PR. Meanwhile, the second method is used for a very different purpose at points where a stub id definitely is known. It ensures that stub generation is primarily driven off the stub declaration or rather its corresponding id tag. All generator methods now retrieve the name using the id rather than hard-wiring it. This does not just apply for opto runtime stubs. There is an equivalent lookup scheme for shared runtime, c1 runtime and stubgen stubs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23829#issuecomment-2697361796 From adinn at openjdk.org Tue Mar 4 12:21:58 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 4 Mar 2025 12:21:58 GMT Subject: Integrated: 8350893: Use generated names for hand generated opto runtime blobs In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 18:00:37 GMT, Andrew Dinn wrote: > The two special case opto runtime blobs that support uncommon trap and exception handling are currently being generated using hard wired blob names determined by port-specific code. They should employ the standard blob names generated from shared declarations in file stubDeclarations.hpp. This pull request has now been integrated. Changeset: 7ee89a53 Author: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/7ee89a53014bc3509271a81c62c91646f891e546 Stats: 29 lines in 9 files changed: 14 ins; 0 del; 15 mod 8350893: Use generated names for hand generated opto runtime blobs Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/23829 From epeter at openjdk.org Tue Mar 4 13:31:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 13:31:45 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v3] In-Reply-To: References: Message-ID: > @rwestrel asked me for this here: > https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 > > The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. > > It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - fix IR rules - add IR test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23865/files - new: https://git.openjdk.org/jdk/pull/23865/files/50f502ef..6610e0e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=01-02 Stats: 65 lines in 1 file changed: 57 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23865/head:pull/23865 PR: https://git.openjdk.org/jdk/pull/23865 From epeter at openjdk.org Tue Mar 4 13:45:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 13:45:52 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v4] In-Reply-To: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 08:31:14 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rm condition > > test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java line 38: > >> 36: */ >> 37: >> 38: public class TestMultiversionRemoveUselessSlowLoop { > > Is it also possible to add an IR test where we can check that a `OpaqueMultiversioning` node is first in the graph and then removed? Maybe we can check that in loop opts phase n, `COUNTED_LOOP_MAIN` and `OpaqueMultiversioning` is present. Then in loop opts phase n + 1, `COUNTED_LOOP_MAIN` is removed and in n + 2, `OpaqueMultiversioning` is removed as well. But I'm not sure if this is reliable enough when some loop opts change - though could easily be fixed or dropped again if the test fails at some point. @chhagedorn I created an IR test, let me know if it looks good enough for you :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979455969 From epeter at openjdk.org Tue Mar 4 13:45:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 13:45:51 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v4] In-Reply-To: References: Message-ID: > @rwestrel asked me for this here: > https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 > > The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. > > It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rm condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23865/files - new: https://git.openjdk.org/jdk/pull/23865/files/6610e0e8..66e71b73 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23865/head:pull/23865 PR: https://git.openjdk.org/jdk/pull/23865 From epeter at openjdk.org Tue Mar 4 13:45:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 13:45:51 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v4] In-Reply-To: References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 10:55:10 GMT, Christian Hagedorn wrote: >> I don't think that the input of an `IfTrue` is anything else than an `IfNode` (or a subclass) while being outside of IGVN. Otherwise, the graph is broken. Moreover, if we decide to call this method from IGVN at some point, we face another problem that predicate iteration is not safe during IGVN - and we probably do not want to make it safe if there is not a very strong reason for it. > > Unrelated thought: Would be great if we've had a flag "IGVN in progress" or something like that in `Compile` on which we can assert to make sure that methods, from which we know are not safe during IGVN, are not called during IGVN at some point by mistake. More often than not, we will only find out about that in some rare edge case where things are top at some unanticipated locations. The assert can be removed again once a method is required to be called from IGVN and is made safe (or we find that it was safe to begin with). Fine, I'll just remove the line :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979453631 From epeter at openjdk.org Tue Mar 4 13:52:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 13:52:03 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v4] In-Reply-To: References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 10:50:59 GMT, Christian Hagedorn wrote: >>> I'm fine with both. But I think at some point we should decide to either go with one or the other solution and not have two ways of solving a similar problem :-) >> >> Right, I agree we should unify the code. >> >>>There is code that removes elements again from the list when calling Node::destruct() and Compile::remove_useless_node(): >> >> That's really a more complicated solution. Probably this is the best solution, which you mentioned earlier: >>>While thinking about this, it would have been great to have a class LoopOpts where we can store all such things that should be live over multiple loop opts pass but which is then not used anymore. Anyway, that's definitely out of scope. >> >> Then we can just clear the relevant arrays on start of each loop-opts phase. That way we do not have to remove individual nodes, and we can still reuse the allocated memory. > >> Then we can just clear the relevant arrays on start of each loop-opts phase. That way we do not have to remove individual nodes, and we can still reuse the allocated memory. > > Yes, that would be ideal. Might be worth to further discuss this and what the impact would be. > > Anyway, I don't think we need to settle this now. Since zero trip guard opaque nodes already introduced a second way of doing it and there is no clear better way, I think we can use one or the other solution (your call which one) for now and visit this later again to unify the solution - hopefully with a new LoopOpts class :-) Is it worth to file an RFE now to keep track of that? Filed: [JDK-8351170](https://bugs.openjdk.org/browse/JDK-8351170): C2 cleanup: unify eliminate_useless_... lists ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979480411 From epeter at openjdk.org Tue Mar 4 13:57:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 13:57:48 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v5] In-Reply-To: References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 09:58:27 GMT, Christian Hagedorn wrote: >> It would be equivalent now, since nobody inherits from `BoolNode`. But if someone ever inherits from it, then this condition would be weaker. Also: I just copied the condition. >> >> I can do it if you still want me to do it, I'm on the fence with this one ;) > > I'm not sure if we were ever to extend `BoolNode`. And if so, I think implementing this requires to check all the `is_Bool()` conditions in our code base anyways - which I expect we have quite a lot of them. And it could be that a subclass would also allow to apply a cmove transformation. So, given that it's unlikely, I would be more inclined to change it to `is_Bool()`. But it's your call. I'm fine with both :-) I changed it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979494875 From epeter at openjdk.org Tue Mar 4 13:57:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 13:57:47 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v5] In-Reply-To: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: <0CthsCtmG4HrFMJh2eyn_BEwuxEl5DWeZhTUVdzeIhA=.7278b11b-fb57-4f8c-b027-f70272fe997c@github.com> On Tue, 4 Mar 2025 08:49:22 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> updated Bool check > > A few comments, otherwise the fix idea looks good to me! @chhagedorn Thanks for all the suggestions! Let me know if there is anything else you would like me to do ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23865#issuecomment-2697688603 From epeter at openjdk.org Tue Mar 4 13:57:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 13:57:46 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v5] In-Reply-To: References: Message-ID: > @rwestrel asked me for this here: > https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 > > The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. > > It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: updated Bool check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23865/files - new: https://git.openjdk.org/jdk/pull/23865/files/66e71b73..8aa59aa3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23865/head:pull/23865 PR: https://git.openjdk.org/jdk/pull/23865 From thartmann at openjdk.org Tue Mar 4 14:00:58 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 4 Mar 2025 14:00:58 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 13:27:00 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review FTR, I did execute testing and it looked all clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2697705164 From chagedorn at openjdk.org Tue Mar 4 14:08:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 4 Mar 2025 14:08:06 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v5] In-Reply-To: References: <4QN20sD3J0aY9aCCt0SfJbaKY3WlhF18oYf76EWTbZU=.04468ebf-64d8-47e1-89d9-36d7226f81a1@github.com> Message-ID: On Tue, 4 Mar 2025 13:39:55 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java line 38: >> >>> 36: */ >>> 37: >>> 38: public class TestMultiversionRemoveUselessSlowLoop { >> >> Is it also possible to add an IR test where we can check that a `OpaqueMultiversioning` node is first in the graph and then removed? Maybe we can check that in loop opts phase n, `COUNTED_LOOP_MAIN` and `OpaqueMultiversioning` is present. Then in loop opts phase n + 1, `COUNTED_LOOP_MAIN` is removed and in n + 2, `OpaqueMultiversioning` is removed as well. But I'm not sure if this is reliable enough when some loop opts change - though could easily be fixed or dropped again if the test fails at some point. > > @chhagedorn I created an IR test, let me know if it looks good enough for you :) Nice test! Thanks for taking the extra effort. I only have a small suggestion for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979524288 From chagedorn at openjdk.org Tue Mar 4 14:08:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 4 Mar 2025 14:08:05 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v5] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 13:57:46 GMT, Emanuel Peter wrote: >> @rwestrel asked me for this here: >> https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 >> >> The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. >> >> It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > updated Bool check Otherwise, the update looks good, thanks! test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java line 60: > 58: "post .* multiversion_fast", "= 2", > 59: "multiversion_delayed_slow", "= 2", // both have the delayed slow_loop > 60: "multiversion", "= 8"}, // nothing unexpected Should we also add a match on `IRNode.OPAQUE_MULTIVERSIONING`? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23865#pullrequestreview-2657687930 PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979521253 From epeter at openjdk.org Tue Mar 4 14:24:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 14:24:31 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v6] In-Reply-To: References: Message-ID: > @rwestrel asked me for this here: > https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 > > The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. > > It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: improve IR rules for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23865/files - new: https://git.openjdk.org/jdk/pull/23865/files/8aa59aa3..7b2e4ce6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=04-05 Stats: 12 lines in 2 files changed: 9 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23865/head:pull/23865 PR: https://git.openjdk.org/jdk/pull/23865 From epeter at openjdk.org Tue Mar 4 14:24:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 14:24:31 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v5] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 14:04:12 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> updated Bool check > > test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java line 60: > >> 58: "post .* multiversion_fast", "= 2", >> 59: "multiversion_delayed_slow", "= 2", // both have the delayed slow_loop >> 60: "multiversion", "= 8"}, // nothing unexpected > > Should we also add a match on `IRNode.OPAQUE_MULTIVERSIONING`? Done :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1979561037 From epeter at openjdk.org Tue Mar 4 16:01:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Mar 2025 16:01:17 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v7] In-Reply-To: References: Message-ID: > @rwestrel asked me for this here: > https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 > > The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. > > It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: pre loop not reliably folded, adapt IR rule ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23865/files - new: https://git.openjdk.org/jdk/pull/23865/files/7b2e4ce6..b5fddc3e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=05-06 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23865/head:pull/23865 PR: https://git.openjdk.org/jdk/pull/23865 From eastigeevich at openjdk.org Tue Mar 4 17:37:40 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 4 Mar 2025 17:37:40 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache Message-ID: This benchmark is used to check performance impact of the code cache being sparse. We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. Results: code region size 2M (2097152) bytes - Intel Xeon Platinum 8259CL |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | |--- |--- |--- |--- |--- |--- |--- | |128 |1 |128 |19.577 |0.619 |us/op | | |128 |32 |4 |22.968 |0.314 |us/op |17.30% | |128 |48 |3 |22.245 |0.388 |us/op |13.60% | |128 |64 |2 |23.874 |0.84 |us/op |21.90% | |128 |80 |2 |23.786 |0.231 |us/op |21.50% | |128 |96 |1 |26.224 |1.16 |us/op |34% | |128 |112 |1 |27.028 |0.461 |us/op |38.10% | |256 |1 |256 |47.43 |1.146 |us/op | | |256 |32 |8 |63.962 |1.671 |us/op |34.90% | |256 |48 |5 |63.396 |0.247 |us/op |33.70% | |256 |64 |4 |66.604 |2.286 |us/op |40.40% | |256 |80 |3 |59.746 |1.273 |us/op |26% | |256 |96 |3 |63.836 |1.034 |us/op |34.60% | |256 |112 |2 |63.538 |1.814 |us/op |34% | |512 |1 |512 |172.731 |4.409 |us/op | | |512 |32 |16 |206.772 |6.229 |us/op |19.70% | |512 |48 |11 |215.275 |2.228 |us/op |24.60% | |512 |64 |8 |212.962 |2.028 |us/op |23.30% | |512 |80 |6 |201.335 |12.519 |us/op |16.60% | |512 |96 |5 |198.133 |6.502 |us/op |14.70% | |512 |112 |5 |193.739 |3.812 |us/op |12.20% | |768 |1 |768 |325.154 |5.048 |us/op | | |768 |32 |24 |346.298 |20.196 |us/op |6.50% | |768 |48 |16 |350.746 |2.931 |us/op |7.90% | |768 |64 |12 |339.445 |7.927 |us/op |4.40% | |768 |80 |10 |347.408 |7.355 |us/op |6.80% | |768 |96 |8 |340.983 |3.578 |us/op |4.90% | |768 |112 |7 |353.949 |2.98 |us/op |8.90% | |1024 |1 |1024 |368.352 |5.961 |us/op | | |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | - AArch64 Neoverse N1 |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | |--- |--- |--- |--- |--- |--- |--- | |128 |1 |128 |25.297 |0.792 |us/op | | |128 |32 |4 |31.451 |0.455 |us/op |24.30% | |128 |48 |3 |30.641 |0.663 |us/op |21.10% | |128 |64 |2 |31.742 |0.433 |us/op |25.50% | |128 |80 |2 |31.867 |0.719 |us/op |26% | |128 |96 |1 |32.741 |0.358 |us/op |29.40% | |128 |112 |1 |35.679 |0.638 |us/op |41% | |256 |1 |256 |54.577 |1.478 |us/op | | |256 |32 |8 |69.756 |1.771 |us/op |27.80% | |256 |48 |5 |69.276 |0.317 |us/op |26.90% | |256 |64 |4 |71.583 |2.446 |us/op |31.20% | |256 |80 |3 |74.121 |2.521 |us/op |35.80% | |256 |96 |3 |74.21 |0.632 |us/op |36% | |256 |112 |2 |76.15 |2.681 |us/op |39.50% | |512 |1 |512 |206.98 |1.35 |us/op | | |512 |32 |16 |204.413 |4.111 |us/op |-1.20% | |512 |48 |11 |211.315 |5.066 |us/op |2.10% | |512 |64 |8 |224.012 |2.78 |us/op |8.20% | |512 |80 |6 |209.903 |3.291 |us/op |1.40% | |512 |96 |5 |213.318 |5.401 |us/op |3.10% | |512 |112 |5 |210.134 |6.171 |us/op |1.50% | |768 |1 |768 |354.851 |6.912 |us/op | | |768 |32 |24 |364.047 |12.096 |us/op |2.60% | |768 |48 |16 |381.982 |9.478 |us/op |7.60% | |768 |64 |12 |389.904 |20.204 |us/op |9.90% | |768 |80 |10 |385.125 |11.627 |us/op |8.50% | |768 |96 |8 |377.831 |8.263 |us/op |6.50% | |768 |112 |7 |388.252 |3.558 |us/op |9.40% | |1024 |1 |1024 |430.501 |5.762 |us/op | | |1024 |32 |32 |498.758 |16.065 |us/op |15.90% | |1024 |48 |21 |507.239 |4.676 |us/op |17.80% | |1024 |64 |16 |529.827 |25.531 |us/op |23.10% | |1024 |80 |13 |537.753 |18.643 |us/op |24.90% | |1024 |96 |11 |557.753 |7.804 |us/op |29.60% | |1024 |112 |9 |544.645 |20.507 |us/op |26.50% | In the case of 128 active methods and 112 groups the code sparsity (r11c per 1000 instructions) value was 1.09. For 128 active methods and 1 group this value was 0.0001. According to https://github.com/aws/aws-graviton-getting-started/blob/main/perfrunbook/debug_hw_perf.md, a number >0.5 indicates the code being executed by the CPU is very sparse. - AArch64 Neoverse V1 |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | |--- |--- |--- |--- |--- |--- |--- | |128 |1 |128 |16.356 |0.239 |us/op | | |128 |32 |4 |26.53 |0.71 |us/op |62.20% | |128 |48 |3 |26.501 |1.792 |us/op |62% | |128 |64 |2 |27.727 |1.128 |us/op |69.50% | |128 |80 |2 |27.872 |1.346 |us/op |70.40% | |128 |96 |1 |27.795 |0.958 |us/op |69.90% | |128 |112 |1 |28.315 |0.695 |us/op |73.10% | |256 |1 |256 |55.325 |1.74 |us/op | | |256 |32 |8 |88.366 |2.968 |us/op |59.70% | |256 |48 |5 |93.082 |0.539 |us/op |68.20% | |256 |64 |4 |97.154 |2.865 |us/op |75.60% | |256 |80 |3 |102.005 |5.147 |us/op |84.40% | |256 |96 |3 |99.049 |4.068 |us/op |79% | |256 |112 |2 |101.099 |1.467 |us/op |82.70% | |512 |1 |512 |149.965 |3.813 |us/op | | |512 |32 |16 |191.49 |4.07 |us/op |27.70% | |512 |48 |11 |201.375 |3.384 |us/op |34.30% | |512 |64 |8 |204.789 |3.964 |us/op |36.60% | |512 |80 |6 |203.223 |3.236 |us/op |35.50% | |512 |96 |5 |223.094 |3.022 |us/op |48.80% | |512 |112 |5 |220.352 |3.431 |us/op |46.90% | |768 |1 |768 |266.406 |5.179 |us/op | | |768 |32 |24 |290.236 |10.351 |us/op |8.90% | |768 |48 |16 |293.058 |8.69 |us/op |10% | |768 |64 |12 |297.037 |6.729 |us/op |11.50% | |768 |80 |10 |311.171 |2.136 |us/op |16.80% | |768 |96 |8 |313.311 |5.015 |us/op |17.60% | |768 |112 |7 |316.534 |8.885 |us/op |18.80% | |1024 |1 |1024 |383.712 |3.717 |us/op | | |1024 |32 |32 |379.525 |8.701 |us/op |-1.10% | |1024 |48 |21 |388.86 |12.566 |us/op |1.30% | |1024 |64 |16 |398.676 |13.699 |us/op |3.90% | |1024 |80 |13 |410.646 |1.688 |us/op |7% | |1024 |96 |11 |407.945 |10.952 |us/op |6.30% | |1024 |112 |9 |408.161 |17.233 |us/op |6.40% | The worst case for Graviton 3, 256 active methods and 112 groups, and ~83% regression, had the code sparsity value 0.6 vs 0.00002 when all 256 methods were in one group. ------------- Commit messages: - Simplify benchmark code - 8350852: Implement JMH benchmark for sparse CodeCache Changes: https://git.openjdk.org/jdk/pull/23831/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23831&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350852 Stats: 311 lines in 1 file changed: 311 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23831.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23831/head:pull/23831 PR: https://git.openjdk.org/jdk/pull/23831 From eastigeevich at openjdk.org Tue Mar 4 17:37:40 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 4 Mar 2025 17:37:40 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 22:23:23 GMT, Evgeny Astigeevich wrote: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... Hi @vnkozlov, I'd appreciate if you take a look at this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2698426952 From kvn at openjdk.org Tue Mar 4 18:10:04 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 4 Mar 2025 18:10:04 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs [v2] In-Reply-To: <7s8UpI4RbbmsDcq1bA7HyBpTkBoqxeNW6AjvqrXKmWA=.408853db-d0f4-419b-ba93-473ab9e2a352@github.com> References: <3slACYq-NWB6drMXvVSvd10poQExmhZ-yGjSzoM01T8=.e0e01c24-6a30-480a-9331-8d04b3de8913@github.com> <7s8UpI4RbbmsDcq1bA7HyBpTkBoqxeNW6AjvqrXKmWA=.408853db-d0f4-419b-ba93-473ab9e2a352@github.com> Message-ID: On Tue, 4 Mar 2025 12:17:18 GMT, Andrew Dinn wrote: >>> > Hm. It looks like in release builds all of these would be "runtime stub"? Is that expected? >>> We could perhaps fix it to say something more useful. Maybe in a separate PR in case something depends on it having that value in release builds? >> >> I am confused that we have `OptoRuntime::stub_id(...)`, and then we also have newly added by you: >> >> >> runtime.hpp: >> >> // Returns the name associated with a given stub id >> static const char* stub_name(OptoStubId id) { >> assert(id > OptoStubId::NO_STUBID && id < OptoStubId::NUM_STUBIDS, "stub id out of range"); >> return _stub_names[(int)id]; >> } >> >> >> Maybe `OptoRuntime::stub_name` should be calling that one directly, instead of going all the way through `CodeCache::find_blob`? Then we don't need any debug-defines there. > > @shipilev >> I am confused that we have OptoRuntime::stub_id(...), and then we also have newly added by you . . . > > I'm not sure what you mean here. I think you are referring to `OptoRuntime::stub_name(address entry)` and contrasting it with `OptoRuntime::stub_name(OptoStubId id)`? Is that right? If so ... > > The first method is currently used in a couple of places in the C2 compiler to label calls that 1) employ a constant target address and 2) are expected to target one of the opto runtime stub (which also means a specific opto runtime blob since runtime blobs and stubs are 1-1). The stub id associated with the call is probably known -- or, if not, could certainly be coded explicitly -- some way up the call chain, sometimes in a direct caller, sometimes indirectly via a common helper (like the maths helpers). No matter high up the chain you have to go something is looking up that address using an explicit named accessor associated with an explicit stub id. > > However, as things are currently coded the direct calls to that first method only know the call target address. The stub id is not available at the point of call and propagating a stub id or stub name down would require some refactoring. Indeed, the current mess actually involves propagating down an extra, hard-wired name (char*) that is used alongside the string returned by `stub_name(address)`. So, yes, I agree that all really ought to be sorted out, allowing this method to be retired -- but not in this PR. > > Meanwhile, the second method is used for a very different purpose at points where a stub id definitely is known. It ensures that stub generation is primarily driven off the stub declaration or rather its corresponding id tag. All generator methods now retrieve the name using the id rather than hard-wiring it. This does not just apply for opto runtime stubs. There is an equivalent lookup scheme for shared runtime, c1 runtime and stubgen stubs. Thank you, @adinn, for investigating relocation info printouts. My bad, I compared output running old Leyden JDK where I added additional code for relocations printouts: https://github.com/vnkozlov/jdk/commit/4d739c91f1ca3409a5aed114c725485fed4dacc4 May be I should push this into mainline. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23829#issuecomment-2698495370 From kvn at openjdk.org Tue Mar 4 18:13:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 4 Mar 2025 18:13:59 GMT Subject: RFR: 8351158: Incorrect APX EGPR register save ordering In-Reply-To: References: Message-ID: <9H2ztg_cUZEoXY3jxXF7Ik36LhSUv2k4J1e2-hts0uk=.21f93567-e8a0-4008-9a50-f75169a746c0@github.com> On Tue, 4 Mar 2025 12:08:57 GMT, Jatin Bhateja wrote: > Currently, EGPR register save ordering[1] does not comply with the precomputed stack offsets[2]. This leads to incorrect register value reconstruction and various runtime clients using callee's RegisterMap like GC root set enumeration, de-optimization object reconstruction experience assertion failures due to unrecognizable oop pointer locations. > > This issue was discovered during our internal validation of SPECjvm2008 worklets with -XX:+UseAPX runtime flag using Intel SDE tool. > > Quick note on polling SafePoints :- > SafePointNode at polling sites like method return or [outer] loop latches are different from the ones associated with Call sites as we do not spill caller saved registers before them, hence runtime handling for rootset enumeration to detect the oop pointer addresses in last activation solely relies on the RegisterMap populated by reading the RegisterSaver stack dumps. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L268 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L105 Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23895#pullrequestreview-2658528715 From dlong at openjdk.org Tue Mar 4 18:14:00 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 4 Mar 2025 18:14:00 GMT Subject: RFR: 8351036: [JVMCI] value not an s2: -32776 In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:18:40 GMT, Doug Simon wrote: > This PR adds support for JVMCI to install code that requires stack slots whose offset > `Short.MAX_VALUE`. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23888#pullrequestreview-2658525441 From pchilanomate at openjdk.org Tue Mar 4 18:26:10 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 4 Mar 2025 18:26:10 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v5] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Tue, 4 Mar 2025 04:56:24 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with two additional commits since the last revision: > > - fix typo > - moved and hopefully improved invokedynamic comment Marked as reviewed by pchilanomate (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23557#pullrequestreview-2658565622 From sviswanathan at openjdk.org Tue Mar 4 18:45:02 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 4 Mar 2025 18:45:02 GMT Subject: RFR: 8351158: Incorrect APX EGPR register save ordering In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 12:08:57 GMT, Jatin Bhateja wrote: > Currently, EGPR register save ordering[1] does not comply with the precomputed stack offsets[2]. This leads to incorrect register value reconstruction and various runtime clients using callee's RegisterMap like GC root set enumeration, de-optimization object reconstruction experience assertion failures due to unrecognizable oop pointer locations. > > This issue was discovered during our internal validation of SPECjvm2008 worklets with -XX:+UseAPX runtime flag using Intel SDE tool. > > Quick note on polling SafePoints :- > SafePointNode at polling sites like method return or [outer] loop latches are different from the ones associated with Call sites as we do not spill caller saved registers before them, hence runtime handling for rootset enumeration to detect the oop pointer addresses in last activation solely relies on the RegisterMap populated by reading the RegisterSaver stack dumps. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L268 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L105 Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23895#pullrequestreview-2658624329 From kvn at openjdk.org Tue Mar 4 18:50:05 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 4 Mar 2025 18:50:05 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v7] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:01:17 GMT, Emanuel Peter wrote: >> @rwestrel asked me for this here: >> https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 >> >> The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. >> >> It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > pre loop not reliably folded, adapt IR rule src/hotspot/share/opto/loopnode.cpp line 4566: > 4564: if (lpt->_child == nullptr && lpt->is_counted()) { > 4565: CountedLoopNode* head = lpt->_head->as_CountedLoop(); > 4566: if (head->is_main_loop() && head->is_multiversion_fast_loop()) { Can any other than `main` loop marked as `is_multiversion_fast_loop`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1980035523 From kvn at openjdk.org Tue Mar 4 20:02:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 4 Mar 2025 20:02:51 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: <7NWMXhWKPkDLaKODxoPOwFbonDxYO3m4FVtWa27hkzk=.07ca722d-cf02-48c7-b57f-c7c5e5bf2a6c@github.com> On Tue, 4 Mar 2025 17:34:41 GMT, Evgeny Astigeevich wrote: >> This benchmark is used to check performance impact of the code cache being sparse. >> >> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. >> >> Results: code region size 2M (2097152) bytes >> - Intel Xeon Platinum 8259CL >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | >> |--- |--- |--- |--- |--- |--- |--- | >> |128 |1 |128 |19.577 |0.619 |us/op | | >> |128 |32 |4 |22.968 |0.314 |us/op |17.30% | >> |128 |48 |3 |22.245 |0.388 |us/op |13.60% | >> |128 |64 |2 |23.874 |0.84 |us/op |21.90% | >> |128 |80 |2 |23.786 |0.231 |us/op |21.50% | >> |128 |96 |1 |26.224 |1.16 |us/op |34% | >> |128 |112 |1 |27.028 |0.461 |us/op |38.10% | >> |256 |1 |256 |47.43 |1.146 |us/op | | >> |256 |32 |8 |63.962 |1.671 |us/op |34.90% | >> |256 |48 |5 |63.396 |0.247 |us/op |33.70% | >> |256 |64 |4 |66.604 |2.286 |us/op |40.40% | >> |256 |80 |3 |59.746 |1.273 |us/op |26% | >> |256 |96 |3 |63.836 |1.034 |us/op |34.60% | >> |256 |112 |2 |63.538 |1.814 |us/op |34% | >> |512 |1 |512 |172.731 |4.409 |us/op | | >> |512 |32 |16 |206.772 |6.229 |us/op |19.70% | >> |512 |48 |11 |215.275 |2.228 |us/op |24.60% | >> |512 |64 |8 |212.962 |2.028 |us/op |23.30% | >> |512 |80 |6 |201.335 |12.519 |us/op |16.60% | >> |512 |96 |5 |198.133 |6.502 |us/op |14.70% | >> |512 |112 |5 |193.739 |3.812 |us/op |12.20% | >> |768 |1 |768 |325.154 |5.048 |us/op | | >> |768 |32 |24 |346.298 |20.196 |us/op |6.50% | >> |768 |48 |16 |350.746 |2.931 |us/op |7.90% | >> |768 |64 |12 |339.445 |7.927 |us/op |4.40% | >> |768 |80 |10 |347.408 |7.355 |us/op |6.80% | >> |768 |96 |8 |340.983 |3.578 |us/op |4.90% | >> |768 |112 |7 |353.949 |2.98 |us/op |8.90% | >> |1024 |1 |1024 |368.352 |5.961 |us/op | | >> |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | >> |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | >> |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | >> |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | >> |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | >> |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | >> >> - AArch64 Neoverse N1 >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff |... > > Hi @vnkozlov, > > I'd appreciate if you take a look at this. @eastig Can you make compiled code of different/random size to be more representative for real application? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2698768879 From dnsimon at openjdk.org Tue Mar 4 20:14:03 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 4 Mar 2025 20:14:03 GMT Subject: RFR: 8351036: [JVMCI] value not an s2: -32776 In-Reply-To: References: Message-ID: <-tI6hRLLVFZKckI0dXweArTpvkkuppQ-UCe7QCP204M=.7071b95a-b2bd-43fd-8593-47a3e0711a98@github.com> On Tue, 4 Mar 2025 09:18:40 GMT, Doug Simon wrote: > This PR adds support for JVMCI to install code that requires stack slots whose offset > `Short.MAX_VALUE`. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23888#issuecomment-2698788687 From dnsimon at openjdk.org Tue Mar 4 20:14:04 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 4 Mar 2025 20:14:04 GMT Subject: Integrated: 8351036: [JVMCI] value not an s2: -32776 In-Reply-To: References: Message-ID: <13cAPTn_ilQ-6cQXLy7mta5wV4zczVRsYdpRe5RqnWw=.be50128c-0f43-4bca-8cd8-5a01b51b1c34@github.com> On Tue, 4 Mar 2025 09:18:40 GMT, Doug Simon wrote: > This PR adds support for JVMCI to install code that requires stack slots whose offset > `Short.MAX_VALUE`. This pull request has now been integrated. Changeset: a21302bb Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/a21302bb3244b85dd9809c42d1c0fd502bd677cc Stats: 44 lines in 4 files changed: 36 ins; 0 del; 8 mod 8351036: [JVMCI] value not an s2: -32776 Reviewed-by: yzheng, dlong ------------- PR: https://git.openjdk.org/jdk/pull/23888 From eastigeevich at openjdk.org Tue Mar 4 21:47:52 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 4 Mar 2025 21:47:52 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 17:34:41 GMT, Evgeny Astigeevich wrote: >> This benchmark is used to check performance impact of the code cache being sparse. >> >> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. >> >> Results: code region size 2M (2097152) bytes >> - Intel Xeon Platinum 8259CL >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | >> |--- |--- |--- |--- |--- |--- |--- | >> |128 |1 |128 |19.577 |0.619 |us/op | | >> |128 |32 |4 |22.968 |0.314 |us/op |17.30% | >> |128 |48 |3 |22.245 |0.388 |us/op |13.60% | >> |128 |64 |2 |23.874 |0.84 |us/op |21.90% | >> |128 |80 |2 |23.786 |0.231 |us/op |21.50% | >> |128 |96 |1 |26.224 |1.16 |us/op |34% | >> |128 |112 |1 |27.028 |0.461 |us/op |38.10% | >> |256 |1 |256 |47.43 |1.146 |us/op | | >> |256 |32 |8 |63.962 |1.671 |us/op |34.90% | >> |256 |48 |5 |63.396 |0.247 |us/op |33.70% | >> |256 |64 |4 |66.604 |2.286 |us/op |40.40% | >> |256 |80 |3 |59.746 |1.273 |us/op |26% | >> |256 |96 |3 |63.836 |1.034 |us/op |34.60% | >> |256 |112 |2 |63.538 |1.814 |us/op |34% | >> |512 |1 |512 |172.731 |4.409 |us/op | | >> |512 |32 |16 |206.772 |6.229 |us/op |19.70% | >> |512 |48 |11 |215.275 |2.228 |us/op |24.60% | >> |512 |64 |8 |212.962 |2.028 |us/op |23.30% | >> |512 |80 |6 |201.335 |12.519 |us/op |16.60% | >> |512 |96 |5 |198.133 |6.502 |us/op |14.70% | >> |512 |112 |5 |193.739 |3.812 |us/op |12.20% | >> |768 |1 |768 |325.154 |5.048 |us/op | | >> |768 |32 |24 |346.298 |20.196 |us/op |6.50% | >> |768 |48 |16 |350.746 |2.931 |us/op |7.90% | >> |768 |64 |12 |339.445 |7.927 |us/op |4.40% | >> |768 |80 |10 |347.408 |7.355 |us/op |6.80% | >> |768 |96 |8 |340.983 |3.578 |us/op |4.90% | >> |768 |112 |7 |353.949 |2.98 |us/op |8.90% | >> |1024 |1 |1024 |368.352 |5.961 |us/op | | >> |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | >> |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | >> |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | >> |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | >> |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | >> |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | >> >> - AArch64 Neoverse N1 >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff |... > > Hi @vnkozlov, > > I'd appreciate if you take a look at this. > @eastig Can you make compiled code of different/random size to be more representative for real application? Yes, I can do this. I think the size of compiled code is not very important. What is important the time spent in an invoked nmethod. I can add a benchmark where there will be a random distribution of nmethods of different sizes. For the current benchmark of nmethods of the same size, I can add a parameter causing them running different times. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2699008207 From kvn at openjdk.org Tue Mar 4 22:43:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 4 Mar 2025 22:43:08 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v13] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Thu, 27 Feb 2025 14:31:31 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache. >> >> OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data). >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - cleanup > - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup > - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description > - add a separate adrp_movk function to to support targets located more than 4GB away > - Force the use of movk in combination with adrp and ldr instructions to address scenarios > where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp > - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: > _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. > Fix: use _oops_size int16 field to calculate metadata offset > - removing dead code > - a bit of cleanup and addressing review suggestions > - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup > - remove _code_end_offset > - ... and 4 more: https://git.openjdk.org/jdk/compare/3c9d64eb...56c0cc78 This looks much better. Please swap `matadata` and `jvmci data` in outputs according to data layout (metadata before jvmci data). It is in `print_nmethod_stats()` and `print_on_impl()`. Also please merge latest JDK which have SA cleanup related to compilers: https://github.com/openjdk/jdk/pull/23782 ------------- PR Review: https://git.openjdk.org/jdk/pull/21276#pullrequestreview-2659265431 From dlong at openjdk.org Tue Mar 4 23:14:20 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 4 Mar 2025 23:14:20 GMT Subject: Integrated: 8336042: Caller/callee param size mismatch in deoptimization causes crash In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: <-DOch4HtWdaPuapC0aBkemPls96miNahQpfaqEwSyog=.4e7dddaf-6517-4b84-88f2-5833fb19054c@github.com> On Tue, 11 Feb 2025 07:59:01 GMT, Dean Long wrote: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. This pull request has now been integrated. Changeset: 20ea218c Author: Dean Long URL: https://git.openjdk.org/jdk/commit/20ea218ce52f79704445acfe2d4a3dc9d04e86d2 Stats: 161 lines in 11 files changed: 147 ins; 3 del; 11 mod 8336042: Caller/callee param size mismatch in deoptimization causes crash Co-authored-by: Richard Reingruber Reviewed-by: pchilanomate, rrich, vlivanov, never ------------- PR: https://git.openjdk.org/jdk/pull/23557 From sviswanathan at openjdk.org Wed Mar 5 01:10:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 5 Mar 2025 01:10:58 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 21:53:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate code using APX NDD instructions. >> >> **Please note:** I'm on vacation till March 3rd. Responses to the PR comments will be delayed until March 4th. Thank You for your understanding! > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > revert to nf version for {pop/tz/lz}cnt count instructions src/hotspot/cpu/x86/x86_64.ad line 4398: > 4396: %} > 4397: ins_encode %{ > 4398: __ eshrl($dst$$Register, $mem$$Address, markWord::klass_shift_at_offset, false); This change could be done as part of loadNKlassCompactHeaders instruct itself as there is no additional register needed. Something like below: if (UseAPX_ { __ eshrl($dst$$Register, $mem$$Address, markWord::klass_shift_at_offset, false); } else { __ movl($dst$$Register, $mem$$Address); __ shrl($dst$$Register, markWord::klass_shift_at_offset); } src/hotspot/cpu/x86/x86_64.ad line 5573: > 5571: ins_pipe(ialu_reg); > 5572: %} > 5573: This instruct could be removed as this is already an unary operation with separate destination, Likewise other unary operator instructs could also be removed where the destination is already separate from source. src/hotspot/cpu/x86/x86_64.ad line 5587: > 5585: > 5586: instruct countLeadingZerosI_mem_nf(rRegI dst, memory src) %{ > 5587: predicate(UseAPX && UseCountLeadingZerosInstruction); This instruct could be removed as this is already an unary operation with separate destination, ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1980081435 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1980098430 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1980104739 From jbhateja at openjdk.org Wed Mar 5 01:37:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 5 Mar 2025 01:37:13 GMT Subject: RFR: 8351158: Incorrect APX EGPR register save ordering In-Reply-To: <9H2ztg_cUZEoXY3jxXF7Ik36LhSUv2k4J1e2-hts0uk=.21f93567-e8a0-4008-9a50-f75169a746c0@github.com> References: <9H2ztg_cUZEoXY3jxXF7Ik36LhSUv2k4J1e2-hts0uk=.21f93567-e8a0-4008-9a50-f75169a746c0@github.com> Message-ID: On Tue, 4 Mar 2025 18:11:40 GMT, Vladimir Kozlov wrote: >> Currently, EGPR register save ordering[1] does not comply with the precomputed stack offsets[2]. This leads to incorrect register value reconstruction and various runtime clients using callee's RegisterMap like GC root set enumeration, de-optimization object reconstruction experience assertion failures due to unrecognizable oop pointer locations. >> >> This issue was discovered during our internal validation of SPECjvm2008 worklets with -XX:+UseAPX runtime flag using Intel SDE tool. >> >> Quick note on polling SafePoints :- >> SafePointNode at polling sites like method return or [outer] loop latches are different from the ones associated with Call sites as we do not spill caller saved registers before them, hence runtime handling for rootset enumeration to detect the oop pointer addresses in last activation solely relies on the RegisterMap populated by reading the RegisterSaver stack dumps. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L268 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L105 > > Good. Thanks @vnkozlov , @sviswa7 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23895#issuecomment-2699484421 From jbhateja at openjdk.org Wed Mar 5 01:37:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 5 Mar 2025 01:37:13 GMT Subject: Integrated: 8351158: Incorrect APX EGPR register save ordering In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 12:08:57 GMT, Jatin Bhateja wrote: > Currently, EGPR register save ordering[1] does not comply with the precomputed stack offsets[2]. This leads to incorrect register value reconstruction and various runtime clients using callee's RegisterMap like GC root set enumeration, de-optimization object reconstruction experience assertion failures due to unrecognizable oop pointer locations. > > This issue was discovered during our internal validation of SPECjvm2008 worklets with -XX:+UseAPX runtime flag using Intel SDE tool. > > Quick note on polling SafePoints :- > SafePointNode at polling sites like method return or [outer] loop latches are different from the ones associated with Call sites as we do not spill caller saved registers before them, hence runtime handling for rootset enumeration to detect the oop pointer addresses in last activation solely relies on the RegisterMap populated by reading the RegisterSaver stack dumps. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L268 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L105 This pull request has now been integrated. Changeset: 62fa33a8 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/62fa33a8704aef9fd08a8221f4fde217ab749dfc Stats: 31 lines in 1 file changed: 13 ins; 14 del; 4 mod 8351158: Incorrect APX EGPR register save ordering Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/23895 From fyang at openjdk.org Wed Mar 5 06:44:03 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 5 Mar 2025 06:44:03 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v13] In-Reply-To: <6XEmUiapz_UElQlM-x5g61YOp2DSqSh3b0Vdq1jsWx8=.c7cd0945-47e4-49cb-bbd4-e6b7bc06c743@github.com> References: <6XEmUiapz_UElQlM-x5g61YOp2DSqSh3b0Vdq1jsWx8=.c7cd0945-47e4-49cb-bbd4-e6b7bc06c743@github.com> Message-ID: On Wed, 26 Feb 2025 12:22:46 GMT, kuaiwei wrote: >> This patch enhance MergeStores optimization to support merge value with reverse byte order. >> >> Below is benchmark result before and after the patch: >> >> On aliyun g8y (aarch64) >> |name | before | score2 | ratio | >> |---|---|---|---| >> |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| >> |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| >> |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| >> |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| >> |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| >> |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| >> |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| >> |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| >> |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| >> |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| >> |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| >> |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| >> |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| >> |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| >> |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| >> |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| >> |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| >> |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| >> |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| >> |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| >> |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| >> |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| >> |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| >> |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| >> |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| >> |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| >> |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| >> |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| >> |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| >> >> AMD EPYC 9T24 >> ... > > kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Merge remote-tracking branch 'origin/master' into pr/merge_stores_reverse > - Add readable comment > - Fix for review comments > - Allow ValueOrder::Reverse on big-endian platforms > - Revert "Merge more stores" > > This reverts commit 1e1113ed02ec5a9fe181f215d5667e8de487fe47. > - Revert "Fix test502aBE" > > This reverts commit f773fa368577c4f67957c4d40968c5c45e3ae205. > - Fix test502aBE > - Merge more stores > - Remove an useless assertion > - Remove tailing white space > - ... and 9 more: https://git.openjdk.org/jdk/compare/aac9cb45...b3243a56 test/hotspot/jtreg/compiler/c2/TestMergeStores.java line 810: > 808: IRNode.REVERSE_BYTES_L, "1"}, > 809: applyIf = {"UseUnalignedAccesses", "true"}, > 810: applyIfPlatformAnd = {"little-endian", "true", "riscv64", "false"}) // Exclude RISCV64 because ReverseBytes are not supported Seems to me the code comment could be made more accurate. In fact, all the `ReverseBytes` variants will be available on riscv64 if we have the Zbb extension [1]. And I see the newly added IR tests in this test also works on such platforms. I wonder if there is an easy way to enable these IR tests for these platforms. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_b.ad#L181 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1980792097 From chagedorn at openjdk.org Wed Mar 5 06:45:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Mar 2025 06:45:56 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v8] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: On Tue, 4 Mar 2025 07:09:34 GMT, Daniel Lund?n wrote: >> When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. >> >> ### Changeset >> >> It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. >> >> To illustrate the idealization and how it resolves this issue, consider the example below. >> >> ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) >> >> `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. >> >> We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. >> >> The changeset consists of the following changes. >> - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. >> - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. >> - Add multiple new regression tests in `TestGCMLoadPlacement.java`. >> >> For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/ac... > > Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into insert-anti-dependences-8333393+igvn+pr > - Update missing copyright > - Change to GrowableArray > - Update after Christian's review > - Fix subtle bug introduced in previous update > - Update after review comments > - Remove test that no longer reproduces the issue > - First version Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23691#pullrequestreview-2660112639 From epeter at openjdk.org Wed Mar 5 07:05:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Mar 2025 07:05:56 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v7] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 18:47:00 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> pre loop not reliably folded, adapt IR rule > > src/hotspot/share/opto/loopnode.cpp line 4566: > >> 4564: if (lpt->_child == nullptr && lpt->is_counted()) { >> 4565: CountedLoopNode* head = lpt->_head->as_CountedLoop(); >> 4566: if (head->is_main_loop() && head->is_multiversion_fast_loop()) { > > Can any other than `main` loop marked as `is_multiversion_fast_loop`? Yes! See the attached test's IR rule: @IR(counts = {"pre .* multiversion_fast", "= 2", // regular pre-main-post for both loops "main .* multiversion_fast", "= 2", "post .* multiversion_fast", "= 2", "multiversion_delayed_slow", "= 2", // both have the delayed slow_loop "multiversion", "= 8", // nothing unexpected IRNode.OPAQUE_MULTIVERSIONING, "= 2"}, // Both multiversion_if are still here I'll add a comment to the code for more explanation :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23865#discussion_r1980815226 From epeter at openjdk.org Wed Mar 5 07:21:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Mar 2025 07:21:12 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v8] In-Reply-To: References: Message-ID: > @rwestrel asked me for this here: > https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 > > The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. > > It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add more comments for Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23865/files - new: https://git.openjdk.org/jdk/pull/23865/files/b5fddc3e..c53bff59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23865&range=06-07 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23865/head:pull/23865 PR: https://git.openjdk.org/jdk/pull/23865 From epeter at openjdk.org Wed Mar 5 07:21:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Mar 2025 07:21:12 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v7] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:01:17 GMT, Emanuel Peter wrote: >> @rwestrel asked me for this here: >> https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 >> >> The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. >> >> It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > pre loop not reliably folded, adapt IR rule @vnkozlov Thanks for having a look! I added some more comments, hopefully that helps :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23865#issuecomment-2700070807 From epeter at openjdk.org Wed Mar 5 07:46:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Mar 2025 07:46:07 GMT Subject: RFR: 8348657: compiler/loopopts/superword/TestEquivalentInvariants.java timed out Message-ID: We have experienced higher runtime with extra flags (e.g. `-XComp` and some verification flags). I'm increasing the timeout. I checked: it does not seem that any specific method takes much more time than others, I suspect that we do not inline well with `-XComp`, and that makes the `MemorySegment` loop code significantly slower. ------------- Commit messages: - JDK-8348657 Changes: https://git.openjdk.org/jdk/pull/23914/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23914&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8348657 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23914.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23914/head:pull/23914 PR: https://git.openjdk.org/jdk/pull/23914 From thartmann at openjdk.org Wed Mar 5 08:34:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 5 Mar 2025 08:34:51 GMT Subject: RFR: 8348657: compiler/loopopts/superword/TestEquivalentInvariants.java timed out In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 07:36:13 GMT, Emanuel Peter wrote: > We have experienced higher runtime with extra flags (e.g. `-XComp` and some verification flags). I'm increasing the timeout. > > I checked: it does not seem that any specific method takes much more time than others, I suspect that we do not inline well with `-XComp`, and that makes the `MemorySegment` loop code significantly slower. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23914#pullrequestreview-2660329738 From adinn at openjdk.org Wed Mar 5 09:17:04 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 5 Mar 2025 09:17:04 GMT Subject: RFR: 8350893: Use generated names for hand generated opto runtime blobs [v2] In-Reply-To: References: <3slACYq-NWB6drMXvVSvd10poQExmhZ-yGjSzoM01T8=.e0e01c24-6a30-480a-9331-8d04b3de8913@github.com> <7s8UpI4RbbmsDcq1bA7HyBpTkBoqxeNW6AjvqrXKmWA=.408853db-d0f4-419b-ba93-473ab9e2a352@github.com> Message-ID: On Tue, 4 Mar 2025 18:07:13 GMT, Vladimir Kozlov wrote: >> @shipilev >>> I am confused that we have OptoRuntime::stub_id(...), and then we also have newly added by you . . . >> >> I'm not sure what you mean here. I think you are referring to `OptoRuntime::stub_name(address entry)` and contrasting it with `OptoRuntime::stub_name(OptoStubId id)`? Is that right? If so ... >> >> The first method is currently used in a couple of places in the C2 compiler to label calls that 1) employ a constant target address and 2) are expected to target one of the opto runtime stub (which also means a specific opto runtime blob since runtime blobs and stubs are 1-1). The stub id associated with the call is probably known -- or, if not, could certainly be coded explicitly -- some way up the call chain, sometimes in a direct caller, sometimes indirectly via a common helper (like the maths helpers). No matter high up the chain you have to go something is looking up that address using an explicit named accessor associated with an explicit stub id. >> >> However, as things are currently coded the direct calls to that first method only know the call target address. The stub id is not available at the point of call and propagating a stub id or stub name down would require some refactoring. Indeed, the current mess actually involves propagating down an extra, hard-wired name (char*) that is used alongside the string returned by `stub_name(address)`. So, yes, I agree that all really ought to be sorted out, allowing this method to be retired -- but not in this PR. >> >> Meanwhile, the second method is used for a very different purpose at points where a stub id definitely is known. It ensures that stub generation is primarily driven off the stub declaration or rather its corresponding id tag. All generator methods now retrieve the name using the id rather than hard-wiring it. This does not just apply for opto runtime stubs. There is an equivalent lookup scheme for shared runtime, c1 runtime and stubgen stubs. > > Thank you, @adinn, for investigating relocation info printouts. My bad, I compared output running old Leyden JDK where I added additional code for relocations printouts: https://github.com/vnkozlov/jdk/commit/4d739c91f1ca3409a5aed114c725485fed4dacc4 > > May be I should push this into mainline. @vnkozlov Ah ok, that explains why I was not seeing the same output. Yes, I agree this ought to go into mainline. I raised issue [JDK-8351256](https://bugs.openjdk.org/browse/JDK-8351256) and PR [23915](https://github.com/openjdk/jdk/pull/23915) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23829#issuecomment-2700314351 From adinn at openjdk.org Wed Mar 5 09:19:08 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 5 Mar 2025 09:19:08 GMT Subject: RFR: 8351256: Improve printing of runtime call stub names in disassember output Message-ID: Fixes printing of runtime stub call targets in disassembler listings. ------------- Commit messages: - 8351256: Improve printing of runtime call stub names in disassember ouptut Changes: https://git.openjdk.org/jdk/pull/23915/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23915&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351256 Stats: 37 lines in 2 files changed: 36 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23915.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23915/head:pull/23915 PR: https://git.openjdk.org/jdk/pull/23915 From adinn at openjdk.org Wed Mar 5 09:22:51 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 5 Mar 2025 09:22:51 GMT Subject: RFR: 8351256: Improve printing of runtime call stub names in disassember output In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 09:13:02 GMT, Andrew Dinn wrote: > Fixes printing of runtime stub call targets in disassembler listings. Verified by eyeball: $ java -XX:CompileCommand='print,java.lang.Object::' Hello CompileCommand: print java/lang/Object. bool print = true . . . Compiled method (c1) 98 1 3 java.lang.Object:: (1 bytes) . . . 0x0000ffff670001b8: cmp x8, x9 0x0000ffff670001bc: b.eq 0x0000ffff670001d8 // b.none ;; 0xFFFF6E59EAC0 0x0000ffff670001c0: mov x8, #0xeac0 // #60096 ; {runtime_call Stub::method_entry_barrier} 0x0000ffff670001c4: movk x8, #0x6e59, lsl #16 0x0000ffff670001c8: movk x8, #0xffff, lsl #32 0x0000ffff670001cc: blr x8 . . . ------------- PR Comment: https://git.openjdk.org/jdk/pull/23915#issuecomment-2700328666 From epeter at openjdk.org Wed Mar 5 10:04:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Mar 2025 10:04:03 GMT Subject: RFR: 8348657: compiler/loopopts/superword/TestEquivalentInvariants.java timed out In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 08:32:08 GMT, Tobias Hartmann wrote: >> We have experienced higher runtime with extra flags (e.g. `-XComp` and some verification flags). I'm increasing the timeout. >> >> I checked: it does not seem that any specific method takes much more time than others, I suspect that we do not inline well with `-XComp`, and that makes the `MemorySegment` loop code significantly slower. > > Looks good and trivial. @TobiHartmann Thanks for the review. I agree it is trivail. I performed some sanity testing, and it passes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23914#issuecomment-2700431303 From epeter at openjdk.org Wed Mar 5 10:04:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Mar 2025 10:04:03 GMT Subject: Integrated: 8348657: compiler/loopopts/superword/TestEquivalentInvariants.java timed out In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 07:36:13 GMT, Emanuel Peter wrote: > We have experienced higher runtime with extra flags (e.g. `-XComp` and some verification flags). I'm increasing the timeout. > > I checked: it does not seem that any specific method takes much more time than others, I suspect that we do not inline well with `-XComp`, and that makes the `MemorySegment` loop code significantly slower. This pull request has now been integrated. Changeset: 75f028b4 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/75f028b46b245bdcbde8391af69020befda66b7d Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8348657: compiler/loopopts/superword/TestEquivalentInvariants.java timed out Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/23914 From xgong at openjdk.org Wed Mar 5 10:05:52 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 5 Mar 2025 10:05:52 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: <17arBLP__LxvP0MS9liI_o9HTVpzxDJrjy3LjYNn8Ng=.8be17d07-09b0-4d31-b41f-66d3c9be5fad@github.com> <2H_Ol6dl2XhWKowrqvJbdAEFoHWYNu65dR60bjkIaPQ=.879025d0-d0bc-49d1-94e3-da69666c372c@github.com> Message-ID: On Tue, 4 Mar 2025 08:38:20 GMT, Bhavana Kilambi wrote: >> Hi @Bhavana-Kilambi , I'v finished the test with what you suggested on my Grace CPU. The vectorapi jtreg all pass. So this solution works well. But the performance seems no obvious change compared with the current PR's codegen as expected. >> >> Here is the performance data: >> >> Benchmark (size) Mode Cnt Current Bahavana's Units Gain >> Double128Vector.rearrange 1024 thrpt 30 591.504 588.616 ops/ms 0.995 >> Long128Vector.rearrange 1024 thrpt 30 593.348 590.802 ops/ms 0.995 >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 30 16576.713 16664.580 ops/ms 1.005 >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 30 8358.694 8392.733 ops/ms 1.004 >> SelectFromBenchmark.rearrangeFromDoubleVector 1024 thrpt 30 1312.752 1213.538 ops/ms 0.924 >> SelectFromBenchmark.rearrangeFromDoubleVector 2048 thrpt 30 657.365 607.060 ops/ms 0.923 >> SelectFromBenchmark.rearrangeFromFloatVector 1024 thrpt 30 1905.595 1911.831 ops/ms 1.003 >> SelectFromBenchmark.rearrangeFromFloatVector 2048 thrpt 30 952.205 957.160 ops/ms 1.005 >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 30 2106.763 2107.238 ops/ms 1.000 >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 30 1056.299 1056.769 ops/ms 1.000 >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 30 1462.355 1247.853 ops/ms 0.853 >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 30 732.559 616.753 ops/ms 0.841 >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 30 4560.253 4559.861 ops/ms 0.999 >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 30 2279.058 2279.693 ops/ms 1.000 >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 30 1080.589 1073.883 ops/ms 0.993 >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 30 541.629 537.288 ops/ms 0.991 >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 30 269.886 268.460 ops/ms 0.994 >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 30 135.193 134.175 ops/ms 0.992 >> >> >> I expected it will have obvious improvement since we do not need the heavy `ldr` instruction. But I also got the similar performance data on an AArch64 n1 machine. One shortage of your suggestion I can see is it needs one more temp vect... > > Hi @XiaohongGong , thanks for testing this variation. I also expected it to have relatively better performance due to the absence of the load instruction. Maybe it might help in larger real-world workload where reducing some load instructions or having fewer instructions can help performance (by reducing pressure on icache/iTLB). > Thinking of aarch64 Neon machines that we can test this on - we have only N1, V2 (Grace) machines which have support for 128-bit Neon. V1 is 256 bit Neon/SVE which will execute the `sve tbl` instruction instead. I can of course disable SVE and run the Neon instructions on V1 but I don't think that would really make any difference. So for 128-bit Neon machines, I can also test only on N1 and V2 which you've already done. Do you have a specific machine in mind that you'd like this to be tested on? Thanks for your clarify @Bhavana-Kilambi . I agree with you that it may not make any difference on other machines. So do you suggest that I change the pattern right now, or revisit this part once we met the performance issue on other real-world workload? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1981081284 From cushon at openjdk.org Wed Mar 5 15:46:02 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 5 Mar 2025 15:46:02 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 18:45:52 GMT, Liam Miller-Cushon wrote: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. Thanks for taking a look! > Can you please add a description to the PR (and if possible also on JIRA) to explain what the issue is, and how you are fixing it? Done > Is there a regression test that reproduces this reliably? Matthias reports: Re: regtest: I don't currently have a standalone regtest, I am using java/lang/Character/CheckProp.java from the issue report to reproduce the failure and verify the fix. I can try and extract the failure condition into a standalone test. Does the fix look reasonable? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2701332152 From vlivanov at openjdk.org Wed Mar 5 16:46:00 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 5 Mar 2025 16:46:00 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v3] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 13:37:13 GMT, Damon Fenacci wrote: >> # Issue >> >> The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. >> >> # Cause >> >> The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. >> The graph that leads to the issue looks like this: >> ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) >> The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: >> ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) >> The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. >> The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. >> >> This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. >> >> # Solution >> >> Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. >> >> # Testing >> >> Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) >> >> Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) > > Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits: > > - JDK-8302459: unneeded changes > - JDK-8302459: unneeded changes > - JDK-8302459: update assert string > - JDK-8302459: fix copyright year > - JDK-8302459: fix after merge > - Merge branch 'master' into JDK-8302459-new > - JDK-8302459: add logging > - JDK-8302459: remove todos > - JDK-8302459: add check to avoid infinite loop > - Merge branch 'master' into JDK-8302459-new > - ... and 30 more: https://git.openjdk.org/jdk/compare/a637ccf2...e71e72f5 Nice! Overall, looks good. Some minor comments/suggestions. src/hotspot/share/opto/callnode.cpp line 1112: > 1110: "static call node changed: trying again"); > 1111: } > 1112: phase->C->prepend_late_inline(cg); There are 4 occurrences of `prepend_late_inline` followed by `set_generator(nullptr)`. Does it deserve a helper method? src/hotspot/share/opto/compile.cpp line 2044: > 2042: break; // process one call site at a time > 2043: } else { > 2044: if (C->igvn_worklist()->member(cg->call_node()) == is_scheduled_for_igvn_before) { // avoid potential infinite loop Can you remind me, please, what exactly we are trying to catch here? I remember I expressed concerns about the call node being scheduled for IGVN during incremental inlining attempt causing infinite loop during incremental inlining. Does the same apply if the node disappears from IGVN work list during incremental inlining attempt? (It took me some time to recollect what's going on here. Maybe introduce `is_scheduled_for_igvn_after` local and add a comment why both mismatches - `false -> true` and `true -> false` - are problematic?) ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21682#pullrequestreview-2661688405 PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1981755545 PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1981791576 From kvn at openjdk.org Wed Mar 5 17:21:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 5 Mar 2025 17:21:56 GMT Subject: RFR: 8351256: Improve printing of runtime call stub names in disassember output In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 09:13:02 GMT, Andrew Dinn wrote: > Fixes printing of runtime stub call targets in disassembler listings. Good. Thank you for fast PR submission. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23915#pullrequestreview-2661845333 From sparasa at openjdk.org Wed Mar 5 18:08:08 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 5 Mar 2025 18:08:08 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v3] In-Reply-To: References: Message-ID: > The goal of this PR is to generate code using APX NDD instructions. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove epopcount, elzcnt, etzcnt ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/5306c39c..f09266d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=01-02 Stats: 146 lines in 1 file changed: 0 ins; 134 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From sparasa at openjdk.org Wed Mar 5 18:13:02 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 5 Mar 2025 18:13:02 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v2] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 19:22:55 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> revert to nf version for {pop/tz/lz}cnt count instructions > > src/hotspot/cpu/x86/x86_64.ad line 5573: > >> 5571: ins_pipe(ialu_reg); >> 5572: %} >> 5573: > > This instruct could be removed as this is already an unary operation with separate destination, Likewise other unary operator instructs could also be removed where the destination is already separate from source. The ndd instructions epopcnt, elzcnt and etzcnt were removed as they have a separate destination. Please see the update code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1981922722 From sviswanathan at openjdk.org Wed Mar 5 18:20:05 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 5 Mar 2025 18:20:05 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 21:53:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate code using APX NDD instructions. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > revert to nf version for {pop/tz/lz}cnt count instructions test/hotspot/gtest/x86/x86-asmtest.py line 630: > 628: if RegOp in [RegRegImmNddInstruction]: > 629: test_reg1 = 'rax' > 630: test_reg2 = random.choice(test_regs) We don't need to generate another test_reg2, we could use the same from above. Thereby, we will not modify the existing tests and only have the new test instructions added in asmtest.out.h. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1981908543 From kvn at openjdk.org Wed Mar 5 19:16:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 5 Mar 2025 19:16:52 GMT Subject: RFR: 8351256: Improve printing of runtime call stub names in disassember output In-Reply-To: References: Message-ID: <5FlgsT-uGqu8RO7TWcgFHn3tzchI2mPEia7UcrcQEdA=.c3c16f2b-b906-4d78-b4a8-936a3ac2a758@github.com> On Wed, 5 Mar 2025 09:13:02 GMT, Andrew Dinn wrote: > Fixes printing of runtime stub call targets in disassembler listings. ? I can't be reviewer if I am author. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23915#issuecomment-2701844761 From kvn at openjdk.org Wed Mar 5 19:21:03 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 5 Mar 2025 19:21:03 GMT Subject: RFR: 8351256: Improve printing of runtime call stub names in disassember output In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 09:20:07 GMT, Andrew Dinn wrote: >> Fixes printing of runtime stub call targets in disassembler listings. > > Verified by eyeball: > > > $ java -XX:CompileCommand='print,java.lang.Object::' Hello > CompileCommand: print java/lang/Object. bool print = true > . . . > Compiled method (c1) 98 1 3 java.lang.Object:: (1 bytes) > . . . > 0x0000ffff670001b8: cmp x8, x9 > 0x0000ffff670001bc: b.eq 0x0000ffff670001d8 // b.none > ;; 0xFFFF6E59EAC0 > 0x0000ffff670001c0: mov x8, #0xeac0 // #60096 > ; {runtime_call Stub::method_entry_barrier} > 0x0000ffff670001c4: movk x8, #0x6e59, lsl #16 > 0x0000ffff670001c8: movk x8, #0xffff, lsl #32 > 0x0000ffff670001cc: blr x8 > . . . @adinn Can you add additional change to `relocInfo.cpp` to print call resolution blobs ?: https://github.com/vnkozlov/jdk/commit/620b99f8ef7f1bea527d27e4a625a000e83d6663 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23915#issuecomment-2701853164 From kvn at openjdk.org Wed Mar 5 19:52:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 5 Mar 2025 19:52:57 GMT Subject: RFR: 8351256: Improve printing of runtime call stub names in disassember output In-Reply-To: References: Message-ID: <0ElONkl1n7L7cCjPsQ0bU4R61udIuaC4KBzJnGgoIgI=.e47ecf4d-678e-4e50-8670-8bbed0763e87@github.com> On Wed, 5 Mar 2025 09:13:02 GMT, Andrew Dinn wrote: > Fixes printing of runtime stub call targets in disassembler listings. And one more in `nmethod.cpp` which fixes the issue in both, mainline and leyden. `RelocInfo::none` may point to the same address as following relocation info: @0x000000010ec23436: 0002 relocInfo at 0x000000010ec23436 [type=0(none) addr=0x000000010ec234c8 offset=8] @0x000000010ec23438: 3000 relocInfo at 0x000000010ec23438 [type=6(runtime_call) addr=0x000000010ec234c8 offset=0] | [destination=0x000000010ec1f400] Blob::ExceptionBlob As result the real relocation info for this address is not printed: [Exception Handler] 0x000000010ca8c1c8: adrp x8, #0x10ca85000 ; {no_reloc} The fix: @@ -3550,7 +3550,10 @@ const char* nmethod::reloc_string_for(u_char* begin, u_char* end) { while (iter.next()) { have_one = true; switch (iter.type()) { - case relocInfo::none: return "no_reloc"; + case relocInfo::none: { + // Skip it and check next + break; + } case relocInfo::oop_type: { // Get a non-resizable resource-allocated stringStream. // Our callees make use of (nested) ResourceMarks. After fix: [Exception Handler] 0x000000010ec234c8: adrp x8, #0x10ec1f000 ; {runtime_call ExceptionBlob} ------------- PR Comment: https://git.openjdk.org/jdk/pull/23915#issuecomment-2701916728 From kvn at openjdk.org Wed Mar 5 20:00:58 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 5 Mar 2025 20:00:58 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v8] In-Reply-To: References: Message-ID: <1AKhJVL_TmLYFGqpwaMYXi1rMa0Whn0h5VCY3SSWC2E=.ac977d8c-32f3-4d62-95f1-80a1a5c5717e@github.com> On Wed, 5 Mar 2025 07:21:12 GMT, Emanuel Peter wrote: >> @rwestrel asked me for this here: >> https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 >> >> The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. >> >> It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add more comments for Vladimir Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23865#pullrequestreview-2662352622 From sviswanathan at openjdk.org Wed Mar 5 20:09:00 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 5 Mar 2025 20:09:00 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v3] In-Reply-To: References: Message-ID: <7DTflbzBDB9d3ybuR9Zf-opTa59AV5rCrStYBCluhgg=.703447f7-056f-4425-8867-b87a5c8e4c5f@github.com> On Wed, 5 Mar 2025 18:08:08 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate code using APX NDD instructions. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > remove epopcount, elzcnt, etzcnt src/hotspot/cpu/x86/x86_64.ad line 5796: > 5794: %} > 5795: > 5796: A nit pick, unnecessary extra blank lines :). src/hotspot/cpu/x86/x86_64.ad line 6239: > 6237: > 6238: > 6239: instruct cmovI_regUCF2_ne(cmpOpUCF2 cop, rFlagsRegUCF cr, rRegI dst, rRegI src) %{ The cmovI_regUCF2_ne, cmovl_regUCF2_eq, cmovP_regUCF2_ne, cmovP_regUCF2_eq, cmovL_regUCF2_ne, cmovL_regUCF2_eq instructs could also use the ecmovl() instructions. src/hotspot/cpu/x86/x86_64.ad line 6871: > 6869: predicate(UseAPX); > 6870: match(Set dst (AddI src1 src2)); > 6871: effect(KILL cr); We should also bring in the corresponding flag(PD::...); line from instruct addI_rReg in this and other rules where applicable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1981936770 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1981945068 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1982059646 From duke at openjdk.org Wed Mar 5 21:51:26 2025 From: duke at openjdk.org (David Linus Briemann) Date: Wed, 5 Mar 2025 21:51:26 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C Message-ID: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> 8350866: [x86] Add C1 intrinsics for CRC32-C ------------- Commit messages: - add intrinsic for do_update_CRC32C x86 c1 Changes: https://git.openjdk.org/jdk/pull/23826/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23826&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350866 Stats: 68 lines in 2 files changed: 66 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23826.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23826/head:pull/23826 PR: https://git.openjdk.org/jdk/pull/23826 From duke at openjdk.org Wed Mar 5 21:52:41 2025 From: duke at openjdk.org (David Linus Briemann) Date: Wed, 5 Mar 2025 21:52:41 GMT Subject: RFR: 8350325: [PPC64] ConvF2HFIdealizationTests timeouts on Power8 Message-ID: Skip ConvF2HFIdealizationTests for Power8 ------------- Commit messages: - switch skipped test to slow one - fix expression - skip ScalarFloat16OperationsTest for Power 8 Changes: https://git.openjdk.org/jdk/pull/23692/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23692&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350325 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23692.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23692/head:pull/23692 PR: https://git.openjdk.org/jdk/pull/23692 From mdoerr at openjdk.org Wed Mar 5 21:52:41 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 5 Mar 2025 21:52:41 GMT Subject: RFR: 8350325: [PPC64] ConvF2HFIdealizationTests timeouts on Power8 In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 12:02:01 GMT, David Linus Briemann wrote: > Skip ConvF2HFIdealizationTests for Power8 LGTM. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23692#pullrequestreview-2634100621 From adinn at openjdk.org Wed Mar 5 22:21:21 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 5 Mar 2025 22:21:21 GMT Subject: RFR: 8351256: Improve printing of runtime call stub names in disassember output [v2] In-Reply-To: References: Message-ID: > Fixes printing of runtime stub call targets in disassembler listings. Andrew Dinn has updated the pull request incrementally with two additional commits since the last revision: - skip reloc_none case when printing stub targets in nmethods - add stub target printing for more reloc types ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23915/files - new: https://git.openjdk.org/jdk/pull/23915/files/f976e016..8f774199 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23915&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23915&range=00-01 Stats: 16 lines in 2 files changed: 15 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23915.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23915/head:pull/23915 PR: https://git.openjdk.org/jdk/pull/23915 From adinn at openjdk.org Wed Mar 5 22:21:22 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 5 Mar 2025 22:21:22 GMT Subject: RFR: 8351256: Improve printing of runtime call stub names in disassember output In-Reply-To: <0ElONkl1n7L7cCjPsQ0bU4R61udIuaC4KBzJnGgoIgI=.e47ecf4d-678e-4e50-8670-8bbed0763e87@github.com> References: <0ElONkl1n7L7cCjPsQ0bU4R61udIuaC4KBzJnGgoIgI=.e47ecf4d-678e-4e50-8670-8bbed0763e87@github.com> Message-ID: On Wed, 5 Mar 2025 19:50:10 GMT, Vladimir Kozlov wrote: >> Fixes printing of runtime stub call targets in disassembler listings. > > And one more in `nmethod.cpp` which fixes the issue in both, mainline and leyden. > `RelocInfo::none` may point to the same address as following relocation info: > > @0x000000010ec23436: 0002 > relocInfo at 0x000000010ec23436 [type=0(none) addr=0x000000010ec234c8 offset=8] > @0x000000010ec23438: 3000 > relocInfo at 0x000000010ec23438 [type=6(runtime_call) addr=0x000000010ec234c8 offset=0] | [destination=0x000000010ec1f400] Blob::ExceptionBlob > > > As result the real relocation info for this address is not printed: > > > [Exception Handler] > 0x000000010ca8c1c8: adrp x8, #0x10ca85000 ; {no_reloc} > > > > The fix: > > > @@ -3550,7 +3550,10 @@ const char* nmethod::reloc_string_for(u_char* begin, u_char* end) { > while (iter.next()) { > have_one = true; > switch (iter.type()) { > - case relocInfo::none: return "no_reloc"; > + case relocInfo::none: { > + // Skip it and check next > + break; > + } > case relocInfo::oop_type: { > // Get a non-resizable resource-allocated stringStream. > // Our callees make use of (nested) ResourceMarks. > > > After fix: > > > [Exception Handler] > 0x000000010ec234c8: adrp x8, #0x10ec1f000 ; {runtime_call ExceptionBlob} @vnkozlov I pushed the recommended fixes for both reloc info and nmethod. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23915#issuecomment-2702205803 From kvn at openjdk.org Wed Mar 5 22:32:02 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 5 Mar 2025 22:32:02 GMT Subject: RFR: 8351256: Improve printing of runtime call stub names in disassember output [v2] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 22:21:21 GMT, Andrew Dinn wrote: >> Fixes printing of runtime stub call targets in disassembler listings. > > Andrew Dinn has updated the pull request incrementally with two additional commits since the last revision: > > - skip reloc_none case when printing stub targets in nmethods > - add stub target printing for more reloc types Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23915#pullrequestreview-2662648488 From mdoerr at openjdk.org Wed Mar 5 22:38:58 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 5 Mar 2025 22:38:58 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C In-Reply-To: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: On Thu, 27 Feb 2025 14:30:42 GMT, David Linus Briemann wrote: > 8350866: [x86] Add C1 intrinsics for CRC32-C src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 1134: > 1132: > 1133: void LIRGenerator::do_update_CRC32C(Intrinsic* x) { > 1134: assert(UseCRC32CIntrinsics, "need AVX and LCMUL instructions support"); I think "LCMUL" is a typo. Should probably be "CLMUL". Also in the other comment from which it is copied. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23826#discussion_r1982285177 From sparasa at openjdk.org Wed Mar 5 23:40:29 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 5 Mar 2025 23:40:29 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v4] In-Reply-To: References: Message-ID: > The goal of this PR is to generate code using APX NDD instructions. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add flag(PD::...) and clean up loadNKlassCompactHeaders ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/f09266d0..c578c936 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=02-03 Stats: 81 lines in 1 file changed: 58 ins; 18 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From sviswanathan at openjdk.org Thu Mar 6 01:11:13 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 6 Mar 2025 01:11:13 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v3] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 18:08:08 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate code using APX NDD instructions. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > remove epopcount, elzcnt, etzcnt src/hotspot/cpu/x86/x86_64.ad line 8253: > 8251: %} > 8252: > 8253: instruct negI_rReg_ndd(rRegI src, rRegI dst, immI_0 zero, rFlagsReg cr) A nit pick in many of the new negI/negL instructs, we usually list the dst first in instruct. src/hotspot/cpu/x86/x86_64.ad line 9060: > 9058: > 9059: // Arithmetic Shift Right by variable > 9060: instruct sarI_rReg_CL_ndd(rRegI dst, rRegI src, rcx_RegI shift, rFlagsReg cr) The new instructs sarI_rReg_CL_ndd, shrI_rReg_CL_ndd, salL_rReg_CL_ndd, sarL_rReg_CL_ndd, shrL_rReg_CL_ndd could be removed and the original !bmi2 versions could be kept. We dont need to optimize with APX instructions for non bmi2 platforms. src/hotspot/cpu/x86/x86_64.ad line 10401: > 10399: %} > 10400: > 10401: instruct orI_rReg_imm_rReg_ndd(rRegI dst, immI src1, rRegI src2, rFlagsReg cr) It looks to me that we only need one of orI_rReg_rReg_imm_ndd or orI_rReg_imm_rReg_ndd as orI is a commutative operator. src/hotspot/cpu/x86/x86_64.ad line 10806: > 10804: %} > 10805: > 10806: instruct andL_rReg_rReg_mem_ndd(rRegL dst, rRegL src1, memory src2, rFlagsReg cr) It looks to me that we only need one of andL_rReg_rReg_mem_ndd & andL_rReg_mem_rReg_ndd as and is a commutative operator. Likewise for other commutative operators like or, add, xor, mul, etc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1982329656 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1982343164 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1982391235 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1982388864 From sviswanathan at openjdk.org Thu Mar 6 01:11:14 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 6 Mar 2025 01:11:14 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v4] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 23:40:29 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate code using APX NDD instructions. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add flag(PD::...) and clean up loadNKlassCompactHeaders src/hotspot/cpu/x86/x86_64.ad line 9744: > 9742: instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg cr) > 9743: %{ > 9744: predicate(!UseAPX && !VM_Version::supports_bmi2() && n->bottom_type()->basic_type() == T_INT); Don't need to change anything for non bmi2 platforms. The original predicate can be kept as is. This applies to all rorI, rolI, rorL, rolL. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1982407166 From duke at openjdk.org Thu Mar 6 01:35:54 2025 From: duke at openjdk.org (Nicole Xu) Date: Thu, 6 Mar 2025 01:35:54 GMT Subject: RFR: 8346954: [JMH] jdk.incubator.vector.MaskedLogicOpts fails due to IndexOutOfBoundsException [v2] In-Reply-To: References: <3zzpTrqxv5KaBP-FKCAWjfffVonoWr9fKE6S8lO-cTY=.48f4cb20-f9e6-473f-8156-18d1694e7496@github.com> Message-ID: On Thu, 27 Feb 2025 01:55:59 GMT, Xiaohong Gong wrote: >> Nicole Xu has updated the pull request incrementally with two additional commits since the last revision: >> >> - 8346954: [JMH] jdk.incubator.vector.MaskedLogicOpts fails due to IndexOutOfBoundsException >> >> Suite MaskedLogicOpts.maskedLogicOperationsLong512() failed on both x86 >> and AArch64 with the following error: >> >> ``` >> java.lang.IndexOutOfBoundsException: Index 252 out of bounds for length 249 >> ``` >> >> The variable `long256_arr_idx` is misused when indexing `LongVector l2`, >> `l3`, `l4`, `l5` in function `maskedLogicOperationsLongKernel()` >> resulting in the IndexOutOfBoundsException error. On the other hand, the >> unified index for 128-bit, 256-bit and 512-bit species might not be >> proper since it leaves gaps in between when accessing the data >> for 128-bit and 256-bit species. This will unnecessarily include the >> noise due to cache misses or (on some targets) prefetching additional >> cache lines which are not usable, thereby impacting the crispness of >> microbenchmark. >> >> Hence, we improved the benchmark from several aspects, >> 1. Used sufficient number of predicated operations within the vector >> loop while minimizing the noise due to memory operations. >> 2. Modified the index computation logic which can now withstand any >> ARRAYLEN without resulting in an IOOBE. >> 3. Removed redundant vector read/writes to instance fields, thus >> eliminating significant boxing penalty which translates into throughput >> gains. >> >> Change-Id: Ie8a9d495b1ca5e36f1eae069ff70a815a2de00c0 >> - Revert "8346954: [JMH] jdk.incubator.vector.MaskedLogicOpts fails due to IndexOutOfBoundsException" >> >> This reverts commit 083bedec04d5ab78a420e156e74c1257ce30aee8. > > Still looks good to me! @XiaohongGong @eme64 @jatin-bhateja @PaulSandoz Thanks for your review. I'm going to integrate the patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22963#issuecomment-2702495138 From duke at openjdk.org Thu Mar 6 01:35:55 2025 From: duke at openjdk.org (duke) Date: Thu, 6 Mar 2025 01:35:55 GMT Subject: RFR: 8346954: [JMH] jdk.incubator.vector.MaskedLogicOpts fails due to IndexOutOfBoundsException [v2] In-Reply-To: <3zzpTrqxv5KaBP-FKCAWjfffVonoWr9fKE6S8lO-cTY=.48f4cb20-f9e6-473f-8156-18d1694e7496@github.com> References: <3zzpTrqxv5KaBP-FKCAWjfffVonoWr9fKE6S8lO-cTY=.48f4cb20-f9e6-473f-8156-18d1694e7496@github.com> Message-ID: On Wed, 26 Feb 2025 07:04:58 GMT, Nicole Xu wrote: >> Suite `MaskedLogicOpts.maskedLogicOperationsLong512()` failed on both x86 and AArch64 with the following error: >> >> >> java.lang.IndexOutOfBoundsException: Index 252 out of bounds for length 249 >> >> >> The variable `long256_arr_idx` is misused when indexing `LongVector l2`, `l3`, `l4`, `l5` in function `maskedLogicOperationsLongKernel()` resulting in the IndexOutOfBoundsException error. On the other hand, the unified index for 128-bit, 256-bit and 512-bit species might not be proper since it leaves gaps in between when accessing the data for 128-bit and 256-bit species. This will unnecessarily include the noise due to cache misses or (on some targets) prefetching additional cache lines which are not usable, thereby impacting the crispness of microbenchmark. >> >> Hence, we improved the benchmark from several aspects, >> 1. Used sufficient number of predicated operations within the vector loop while minimizing the noise due to memory operations. >> 2. Modified the index computation logic which can now withstand any ARRAYLEN without resulting in an IOOBE. >> 3. Removed redundant vector read/writes to instance fields, thus eliminating significant boxing penalty which translates into throughput gains. > > Nicole Xu has updated the pull request incrementally with two additional commits since the last revision: > > - 8346954: [JMH] jdk.incubator.vector.MaskedLogicOpts fails due to IndexOutOfBoundsException > > Suite MaskedLogicOpts.maskedLogicOperationsLong512() failed on both x86 > and AArch64 with the following error: > > ``` > java.lang.IndexOutOfBoundsException: Index 252 out of bounds for length 249 > ``` > > The variable `long256_arr_idx` is misused when indexing `LongVector l2`, > `l3`, `l4`, `l5` in function `maskedLogicOperationsLongKernel()` > resulting in the IndexOutOfBoundsException error. On the other hand, the > unified index for 128-bit, 256-bit and 512-bit species might not be > proper since it leaves gaps in between when accessing the data > for 128-bit and 256-bit species. This will unnecessarily include the > noise due to cache misses or (on some targets) prefetching additional > cache lines which are not usable, thereby impacting the crispness of > microbenchmark. > > Hence, we improved the benchmark from several aspects, > 1. Used sufficient number of predicated operations within the vector > loop while minimizing the noise due to memory operations. > 2. Modified the index computation logic which can now withstand any > ARRAYLEN without resulting in an IOOBE. > 3. Removed redundant vector read/writes to instance fields, thus > eliminating significant boxing penalty which translates into throughput > gains. > > Change-Id: Ie8a9d495b1ca5e36f1eae069ff70a815a2de00c0 > - Revert "8346954: [JMH] jdk.incubator.vector.MaskedLogicOpts fails due to IndexOutOfBoundsException" > > This reverts commit 083bedec04d5ab78a420e156e74c1257ce30aee8. @xyyNicole Your change (at version 896c27ea0b74f2848185d1bd8f931a0f44249673) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22963#issuecomment-2702497534 From duke at openjdk.org Thu Mar 6 01:43:05 2025 From: duke at openjdk.org (Nicole Xu) Date: Thu, 6 Mar 2025 01:43:05 GMT Subject: Integrated: 8346954: [JMH] jdk.incubator.vector.MaskedLogicOpts fails due to IndexOutOfBoundsException In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 09:04:47 GMT, Nicole Xu wrote: > Suite `MaskedLogicOpts.maskedLogicOperationsLong512()` failed on both x86 and AArch64 with the following error: > > > java.lang.IndexOutOfBoundsException: Index 252 out of bounds for length 249 > > > The variable `long256_arr_idx` is misused when indexing `LongVector l2`, `l3`, `l4`, `l5` in function `maskedLogicOperationsLongKernel()` resulting in the IndexOutOfBoundsException error. On the other hand, the unified index for 128-bit, 256-bit and 512-bit species might not be proper since it leaves gaps in between when accessing the data for 128-bit and 256-bit species. This will unnecessarily include the noise due to cache misses or (on some targets) prefetching additional cache lines which are not usable, thereby impacting the crispness of microbenchmark. > > Hence, we improved the benchmark from several aspects, > 1. Used sufficient number of predicated operations within the vector loop while minimizing the noise due to memory operations. > 2. Modified the index computation logic which can now withstand any ARRAYLEN without resulting in an IOOBE. > 3. Removed redundant vector read/writes to instance fields, thus eliminating significant boxing penalty which translates into throughput gains. This pull request has now been integrated. Changeset: 107ee878 Author: Nicole Xu URL: https://git.openjdk.org/jdk/commit/107ee878d66f4006f102c1fd12af3bf156a25757 Stats: 145 lines in 1 file changed: 5 ins; 29 del; 111 mod 8346954: [JMH] jdk.incubator.vector.MaskedLogicOpts fails due to IndexOutOfBoundsException Co-authored-by: Jatin Bhateja Reviewed-by: jbhateja, xgong ------------- PR: https://git.openjdk.org/jdk/pull/22963 From duke at openjdk.org Thu Mar 6 02:03:10 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 6 Mar 2025 02:03:10 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v13] In-Reply-To: References: <6XEmUiapz_UElQlM-x5g61YOp2DSqSh3b0Vdq1jsWx8=.c7cd0945-47e4-49cb-bbd4-e6b7bc06c743@github.com> Message-ID: On Wed, 5 Mar 2025 06:40:01 GMT, Fei Yang wrote: >> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Merge remote-tracking branch 'origin/master' into pr/merge_stores_reverse >> - Add readable comment >> - Fix for review comments >> - Allow ValueOrder::Reverse on big-endian platforms >> - Revert "Merge more stores" >> >> This reverts commit 1e1113ed02ec5a9fe181f215d5667e8de487fe47. >> - Revert "Fix test502aBE" >> >> This reverts commit f773fa368577c4f67957c4d40968c5c45e3ae205. >> - Fix test502aBE >> - Merge more stores >> - Remove an useless assertion >> - Remove tailing white space >> - ... and 9 more: https://git.openjdk.org/jdk/compare/aac9cb45...b3243a56 > > test/hotspot/jtreg/compiler/c2/TestMergeStores.java line 810: > >> 808: IRNode.REVERSE_BYTES_L, "1"}, >> 809: applyIf = {"UseUnalignedAccesses", "true"}, >> 810: applyIfPlatformAnd = {"little-endian", "true", "riscv64", "false"}) // Exclude RISCV64 because ReverseBytes are not supported > > Seems to me the code comment could be made more accurate. In fact, all the `ReverseBytes` variants will be available on riscv64 if we have the Zbb extension [1]. And I see the newly added IR tests in this file also works on such platforms. I wonder if there is an easy way to enable these IR tests for these platforms. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_b.ad#L181 @RealFYang , thanks for your comments. As you mentioned, IR test framework can not check Zbb extension for RISCV platform, so I disable it for RISCV64. I think we can make a new PR to enhance IR test framework and enable the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1982457147 From fyang at openjdk.org Thu Mar 6 02:20:00 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 6 Mar 2025 02:20:00 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v13] In-Reply-To: References: <6XEmUiapz_UElQlM-x5g61YOp2DSqSh3b0Vdq1jsWx8=.c7cd0945-47e4-49cb-bbd4-e6b7bc06c743@github.com> Message-ID: On Thu, 6 Mar 2025 01:59:55 GMT, kuaiwei wrote: >> test/hotspot/jtreg/compiler/c2/TestMergeStores.java line 810: >> >>> 808: IRNode.REVERSE_BYTES_L, "1"}, >>> 809: applyIf = {"UseUnalignedAccesses", "true"}, >>> 810: applyIfPlatformAnd = {"little-endian", "true", "riscv64", "false"}) // Exclude RISCV64 because ReverseBytes are not supported >> >> Seems to me the code comment could be made more accurate. In fact, all the `ReverseBytes` variants will be available on riscv64 if we have the Zbb extension [1]. And I see the newly added IR tests in this file also works on such platforms. I wonder if there is an easy way to enable these IR tests for these platforms. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_b.ad#L181 > > @RealFYang , thanks for your comments. As you mentioned, IR test framework can not check Zbb extension for RISCV platform, so I disable it for RISCV64. I think we can make a new PR to enhance IR test framework and enable the test. That works for me. Maybe you can update the code comments to make it more accurate? Like: `// Exclude riscv64 where ReverseBytes is only conditionally supported` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1982539011 From duke at openjdk.org Thu Mar 6 02:45:55 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 6 Mar 2025 02:45:55 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v14] In-Reply-To: References: Message-ID: > This patch enhance MergeStores optimization to support merge value with reverse byte order. > > Below is benchmark result before and after the patch: > > On aliyun g8y (aarch64) > |name | before | score2 | ratio | > |---|---|---|---| > |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| > |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| > |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| > |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| > |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| > |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| > |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| > |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| > |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| > |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| > |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| > |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| > |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| > |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| > |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| > |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| > |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| > |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| > |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| > |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| > |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| > |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| > |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| > |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| > |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| > |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| > |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| > |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| > |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| > > AMD EPYC 9T24 > |name | before | after | ratio | > |---|---|---|---| > |MergeStoreBench.setChar... kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Update riscv comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23030/files - new: https://git.openjdk.org/jdk/pull/23030/files/b3243a56..92e2fcb7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23030&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23030&range=12-13 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23030/head:pull/23030 PR: https://git.openjdk.org/jdk/pull/23030 From duke at openjdk.org Thu Mar 6 02:45:55 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 6 Mar 2025 02:45:55 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v13] In-Reply-To: References: <6XEmUiapz_UElQlM-x5g61YOp2DSqSh3b0Vdq1jsWx8=.c7cd0945-47e4-49cb-bbd4-e6b7bc06c743@github.com> Message-ID: On Thu, 6 Mar 2025 02:17:17 GMT, Fei Yang wrote: >> @RealFYang , thanks for your comments. As you mentioned, IR test framework can not check Zbb extension for RISCV platform, so I disable it for RISCV64. I think we can make a new PR to enhance IR test framework and enable the test. > > That works for me. Maybe you can update the code comments to make it more accurate? Like: > `// Exclude riscv64 where ReverseBytes is only conditionally supported` Comments updated ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1982600202 From duke at openjdk.org Thu Mar 6 02:45:56 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 6 Mar 2025 02:45:56 GMT Subject: Withdrawn: 8347405: MergeStores with reverse bytes order value In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 09:07:11 GMT, kuaiwei wrote: > This patch enhance MergeStores optimization to support merge value with reverse byte order. > > Below is benchmark result before and after the patch: > > On aliyun g8y (aarch64) > |name | before | score2 | ratio | > |---|---|---|---| > |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| > |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| > |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| > |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| > |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| > |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| > |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| > |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| > |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| > |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| > |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| > |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| > |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| > |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| > |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| > |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| > |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| > |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| > |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| > |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| > |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| > |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| > |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| > |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| > |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| > |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| > |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| > |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| > |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| > > AMD EPYC 9T24 > |name | before | after | ratio | > |---|---|---|---| > |MergeStoreBench.setChar... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23030 From fyang at openjdk.org Thu Mar 6 02:51:04 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 6 Mar 2025 02:51:04 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v13] In-Reply-To: References: <6XEmUiapz_UElQlM-x5g61YOp2DSqSh3b0Vdq1jsWx8=.c7cd0945-47e4-49cb-bbd4-e6b7bc06c743@github.com> Message-ID: On Thu, 6 Mar 2025 02:41:43 GMT, kuaiwei wrote: >> That works for me. Maybe you can update the code comments to make it more accurate? Like: >> `// Exclude riscv64 where ReverseBytes is only conditionally supported` > > Comments updated Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1982605032 From dholmes at openjdk.org Thu Mar 6 05:56:54 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Mar 2025 05:56:54 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK [v2] In-Reply-To: <2fluPJGNiu9SvOwq6MfyLch7lChTjPlOJh7dcrXxfa4=.55bdfd48-08dd-4f14-b683-88fccbc66e1e@github.com> References: <2fluPJGNiu9SvOwq6MfyLch7lChTjPlOJh7dcrXxfa4=.55bdfd48-08dd-4f14-b683-88fccbc66e1e@github.com> Message-ID: <1dx6hZk1YbPUl9q1rtXftVMUT1yHd4wwa2_aEqBMhTY=.ba3feb43-bd68-4803-8484-403c6b906861@github.com> On Thu, 6 Mar 2025 02:18:43 GMT, Jiangli Zhou wrote: >> Please review the `Arguments::parse_each_vm_init_arg` change to ignore`-server|-client` options, which avoids unrecognized option error on static JDK. >> >> On regular JDK, '-server|-client' options are processed/removed from command-line arguments by `CheckJvmType` during `CreateExecutionEnvironment`. That happens before `Arguments::parse_each_vm_init_arg` is called. With jvm.cfg setting, only server vm is known and client is ignored. So specifying '-server' and '-client' in command-line is really a no-op. >> >> On static JDK, the VM is statically linked with the launcher, and `CreateExecutionEnvironment` & `CheckJvmType` are not called. As the result, `Arguments::parse_each_vm_init_arg` could see `-server|-client` when running on static JDK, if the options are specified in the command line. > > Jiangli Zhou has updated the pull request incrementally with two additional commits since the last revision: > > - Remove '-server' from all following tests. > > Add?@requires vm.flavor == "server" & !vm.emulatedClient since these tests run on c2: > - compiler/c2/TestReduceAllocationAndHeapDump.java > - compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java > > These tests already have @requires?vm.compiler2.enabled: > - compiler/c2/TestReduceAllocationAndLoadKlass.java > - compiler/c2/TestReduceAllocationAndNonExactAllocate.java > - compiler/c2/TestReduceAllocationAndNullableLoads.java > - compiler/c2/TestReduceAllocationAndPointerComparisons.java > - compiler/escapeAnalysis/TestIterativeEA.java > > Can run on c1/c2: > - compiler/escapeAnalysis/TestReduceAllocationAndNonReduciblePhi.java > > Already have @requires vm.flavor == "server": > - compiler/intrinsics/math/TestMinMaxIntrinsics.java > - compiler/profiling/TestTypeProfiling.java > - gc/stress/gcbasher/TestGCBasherWithG1.java > - gc/stress/gcbasher/TestGCBasherWithParallel.java > - gc/stress/gcbasher/TestGCBasherWithSerial.java > > Not compiler specific: > - runtime/CDSCompressedKPtrs/XShareAuto.java > - Revert src/hotspot/share/runtime/arguments.cpp. @jianglizhou this seems to now be a compiler team issue to review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23881#issuecomment-2702883441 From dholmes at openjdk.org Thu Mar 6 06:14:52 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Mar 2025 06:14:52 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK [v2] In-Reply-To: <2fluPJGNiu9SvOwq6MfyLch7lChTjPlOJh7dcrXxfa4=.55bdfd48-08dd-4f14-b683-88fccbc66e1e@github.com> References: <2fluPJGNiu9SvOwq6MfyLch7lChTjPlOJh7dcrXxfa4=.55bdfd48-08dd-4f14-b683-88fccbc66e1e@github.com> Message-ID: <_mnGdermmAHz7Uc0ImXgngswbkF1RgXjycQKTQAiz9I=.da23ce88-3967-4891-9bd7-b2b0a04e76ab@github.com> On Thu, 6 Mar 2025 02:18:43 GMT, Jiangli Zhou wrote: >> Please review the `Arguments::parse_each_vm_init_arg` change to ignore`-server|-client` options, which avoids unrecognized option error on static JDK. >> >> On regular JDK, '-server|-client' options are processed/removed from command-line arguments by `CheckJvmType` during `CreateExecutionEnvironment`. That happens before `Arguments::parse_each_vm_init_arg` is called. With jvm.cfg setting, only server vm is known and client is ignored. So specifying '-server' and '-client' in command-line is really a no-op. >> >> On static JDK, the VM is statically linked with the launcher, and `CreateExecutionEnvironment` & `CheckJvmType` are not called. As the result, `Arguments::parse_each_vm_init_arg` could see `-server|-client` when running on static JDK, if the options are specified in the command line. > > Jiangli Zhou has updated the pull request incrementally with two additional commits since the last revision: > > - Remove '-server' from all following tests. > > Add?@requires vm.flavor == "server" & !vm.emulatedClient since these tests run on c2: > - compiler/c2/TestReduceAllocationAndHeapDump.java > - compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java > > These tests already have @requires?vm.compiler2.enabled: > - compiler/c2/TestReduceAllocationAndLoadKlass.java > - compiler/c2/TestReduceAllocationAndNonExactAllocate.java > - compiler/c2/TestReduceAllocationAndNullableLoads.java > - compiler/c2/TestReduceAllocationAndPointerComparisons.java > - compiler/escapeAnalysis/TestIterativeEA.java > > Can run on c1/c2: > - compiler/escapeAnalysis/TestReduceAllocationAndNonReduciblePhi.java > > Already have @requires vm.flavor == "server": > - compiler/intrinsics/math/TestMinMaxIntrinsics.java > - compiler/profiling/TestTypeProfiling.java > - gc/stress/gcbasher/TestGCBasherWithG1.java > - gc/stress/gcbasher/TestGCBasherWithParallel.java > - gc/stress/gcbasher/TestGCBasherWithSerial.java > > Not compiler specific: > - runtime/CDSCompressedKPtrs/XShareAuto.java > - Revert src/hotspot/share/runtime/arguments.cpp. test/hotspot/jtreg/runtime/CDSCompressedKPtrs/XShareAuto.java line 43: > 41: public static void main(String[] args) throws Exception { > 42: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder( > 43: "-XX:+UnlockDiagnosticVMOptions", This test was created specifically as a regression test because `-server` didn't work. I'm not sure it serves any purpose now. @iklam? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23881#discussion_r1982743025 From clanger at openjdk.org Thu Mar 6 06:18:52 2025 From: clanger at openjdk.org (Christoph Langer) Date: Thu, 6 Mar 2025 06:18:52 GMT Subject: RFR: 8350325: [PPC64] ConvF2HFIdealizationTests timeouts on Power8 In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 12:02:01 GMT, David Linus Briemann wrote: > Skip ConvF2HFIdealizationTests for Power8 Marked as reviewed by clanger (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23692#pullrequestreview-2663431578 From epeter at openjdk.org Thu Mar 6 06:51:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Mar 2025 06:51:04 GMT Subject: RFR: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears [v8] In-Reply-To: <1AKhJVL_TmLYFGqpwaMYXi1rMa0Whn0h5VCY3SSWC2E=.ac977d8c-32f3-4d62-95f1-80a1a5c5717e@github.com> References: <1AKhJVL_TmLYFGqpwaMYXi1rMa0Whn0h5VCY3SSWC2E=.ac977d8c-32f3-4d62-95f1-80a1a5c5717e@github.com> Message-ID: On Wed, 5 Mar 2025 19:58:11 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add more comments for Vladimir > > Good. Thanks @vnkozlov @chhagedorn for the reviews, and thanks to @rwestrel for the original suggestion to remove useless slow loops! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23865#issuecomment-2702965087 From epeter at openjdk.org Thu Mar 6 06:51:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Mar 2025 06:51:05 GMT Subject: Integrated: 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 13:38:55 GMT, Emanuel Peter wrote: > @rwestrel asked me for this here: > https://github.com/openjdk/jdk/pull/22016#issuecomment-2684365921 > > The idea: an unused `multiversion_if` (i.e. one where the `slow_loop` is still delayed, i.e. where the `fast_loop` has not yet added a runtime-check to the `multiversion_if`) can be constant-folded if the main `fast_loop` disappears. Because at that point we know that we will never add a new condition to the `multiversion_if`, and it will constant fold to true (towards the `fast_loop`) after loop-opts anyway. > > It also seems to fix a bug, where all multiversioned loops (fast / slow, pre/main/post) disappear, and then we are left with a if-diamond with the multiversion_if. This then hits assertion code in `PhaseIdealLoop::conditional_move`. This issue is also addressed with this patch here. I adjusted the code around that assert slightly to give better reporting and ensure we bail out of the optimization if we see an unexpected pattern. This pull request has now been integrated. Changeset: e82031ec Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/e82031ec1a8ae2478f83d009594d512a13fdb77e Stats: 234 lines in 7 files changed: 231 ins; 0 del; 3 mod 8350756: C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/23865 From epeter at openjdk.org Thu Mar 6 07:05:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Mar 2025 07:05:03 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v14] In-Reply-To: References: Message-ID: <219KgE5pWn3aVTmM3NIh1MWVBkf6Kz6Azn9TfAuAFUg=.21b9a427-5eee-4c36-a758-fa516c7c9eab@github.com> On Thu, 6 Mar 2025 02:45:55 GMT, kuaiwei wrote: >> This patch enhance MergeStores optimization to support merge value with reverse byte order. >> >> Below is benchmark result before and after the patch: >> >> On aliyun g8y (aarch64) >> |name | before | score2 | ratio | >> |---|---|---|---| >> |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| >> |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| >> |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| >> |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| >> |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| >> |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| >> |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| >> |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| >> |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| >> |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| >> |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| >> |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| >> |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| >> |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| >> |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| >> |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| >> |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| >> |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| >> |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| >> |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| >> |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| >> |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| >> |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| >> |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| >> |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| >> |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| >> |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| >> |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| >> |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| >> >> AMD EPYC 9T24 >> ... > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Update riscv comments Still good for me. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23030#pullrequestreview-2663515472 From iklam at openjdk.org Thu Mar 6 07:06:03 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 6 Mar 2025 07:06:03 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK [v2] In-Reply-To: <_mnGdermmAHz7Uc0ImXgngswbkF1RgXjycQKTQAiz9I=.da23ce88-3967-4891-9bd7-b2b0a04e76ab@github.com> References: <2fluPJGNiu9SvOwq6MfyLch7lChTjPlOJh7dcrXxfa4=.55bdfd48-08dd-4f14-b683-88fccbc66e1e@github.com> <_mnGdermmAHz7Uc0ImXgngswbkF1RgXjycQKTQAiz9I=.da23ce88-3967-4891-9bd7-b2b0a04e76ab@github.com> Message-ID: On Thu, 6 Mar 2025 06:04:58 GMT, David Holmes wrote: >> Jiangli Zhou has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove '-server' from all following tests. >> >> Add?@requires vm.flavor == "server" & !vm.emulatedClient since these tests run on c2: >> - compiler/c2/TestReduceAllocationAndHeapDump.java >> - compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java >> >> These tests already have @requires?vm.compiler2.enabled: >> - compiler/c2/TestReduceAllocationAndLoadKlass.java >> - compiler/c2/TestReduceAllocationAndNonExactAllocate.java >> - compiler/c2/TestReduceAllocationAndNullableLoads.java >> - compiler/c2/TestReduceAllocationAndPointerComparisons.java >> - compiler/escapeAnalysis/TestIterativeEA.java >> >> Can run on c1/c2: >> - compiler/escapeAnalysis/TestReduceAllocationAndNonReduciblePhi.java >> >> Already have @requires vm.flavor == "server": >> - compiler/intrinsics/math/TestMinMaxIntrinsics.java >> - compiler/profiling/TestTypeProfiling.java >> - gc/stress/gcbasher/TestGCBasherWithG1.java >> - gc/stress/gcbasher/TestGCBasherWithParallel.java >> - gc/stress/gcbasher/TestGCBasherWithSerial.java >> >> Not compiler specific: >> - runtime/CDSCompressedKPtrs/XShareAuto.java >> - Revert src/hotspot/share/runtime/arguments.cpp. > > test/hotspot/jtreg/runtime/CDSCompressedKPtrs/XShareAuto.java line 43: > >> 41: public static void main(String[] args) throws Exception { >> 42: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder( >> 43: "-XX:+UnlockDiagnosticVMOptions", > > This test was created specifically as a regression test because `-server` didn't work. I'm not sure it serves any purpose now. @iklam? I think the test still serves as a functional test: if an archive is specified, then -Xshare:auto is used by default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23881#discussion_r1982800754 From alanb at openjdk.org Thu Mar 6 07:46:56 2025 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 6 Mar 2025 07:46:56 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK [v2] In-Reply-To: References: <2fluPJGNiu9SvOwq6MfyLch7lChTjPlOJh7dcrXxfa4=.55bdfd48-08dd-4f14-b683-88fccbc66e1e@github.com> <_mnGdermmAHz7Uc0ImXgngswbkF1RgXjycQKTQAiz9I=.da23ce88-3967-4891-9bd7-b2b0a04e76ab@github.com> Message-ID: On Thu, 6 Mar 2025 07:03:53 GMT, Ioi Lam wrote: >> test/hotspot/jtreg/runtime/CDSCompressedKPtrs/XShareAuto.java line 43: >> >>> 41: public static void main(String[] args) throws Exception { >>> 42: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder( >>> 43: "-XX:+UnlockDiagnosticVMOptions", >> >> This test was created specifically as a regression test because `-server` didn't work. I'm not sure it serves any purpose now. @iklam? > > I think the test still serves as a functional test: if an archive is specified, then -Xshare:auto is used by default. Would it be better to use @requires !jdk.static for now so this test isn't selected when testing static builds? There is work further down the line to figure out a story for static builds + AOT as the ultimate goal is to run jlink and produce a single executable so there wouldn't be a separate AOT cache/archive on the file system. It's much further down the road, not clear that it's worth trying to test intermediate tests with -Xshare:auto at this time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23881#discussion_r1982845616 From epeter at openjdk.org Thu Mar 6 09:05:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Mar 2025 09:05:53 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 18:45:52 GMT, Liam Miller-Cushon wrote: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. src/hotspot/share/opto/phaseX.cpp line 2022: > 2020: } > 2021: }; > 2022: to_push->visit_uses(push_and_uses_to_worklist, is_boundary); And why not just call this line in two places, rather than having to work with `to_push`? Would that not be less code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r1982961208 From bkilambi at openjdk.org Thu Mar 6 09:17:53 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 6 Mar 2025 09:17:53 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: <17arBLP__LxvP0MS9liI_o9HTVpzxDJrjy3LjYNn8Ng=.8be17d07-09b0-4d31-b41f-66d3c9be5fad@github.com> <2H_Ol6dl2XhWKowrqvJbdAEFoHWYNu65dR60bjkIaPQ=.879025d0-d0bc-49d1-94e3-da69666c372c@github.com> Message-ID: On Wed, 5 Mar 2025 10:03:31 GMT, Xiaohong Gong wrote: >> Hi @XiaohongGong , thanks for testing this variation. I also expected it to have relatively better performance due to the absence of the load instruction. Maybe it might help in larger real-world workload where reducing some load instructions or having fewer instructions can help performance (by reducing pressure on icache/iTLB). >> Thinking of aarch64 Neon machines that we can test this on - we have only N1, V2 (Grace) machines which have support for 128-bit Neon. V1 is 256 bit Neon/SVE which will execute the `sve tbl` instruction instead. I can of course disable SVE and run the Neon instructions on V1 but I don't think that would really make any difference. So for 128-bit Neon machines, I can also test only on N1 and V2 which you've already done. Do you have a specific machine in mind that you'd like this to be tested on? > > Thanks for your clarify @Bhavana-Kilambi . I agree with you that it may not make any difference on other machines. So do you suggest that I change the pattern right now, or revisit this part once we met the performance issue on other real-world workload? Sure, I am fine with going ahead with the current implementation and revisit if we encounter any performance issues. Thanks for testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1982979067 From bkilambi at openjdk.org Thu Mar 6 09:17:53 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 6 Mar 2025 09:17:53 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: <17arBLP__LxvP0MS9liI_o9HTVpzxDJrjy3LjYNn8Ng=.8be17d07-09b0-4d31-b41f-66d3c9be5fad@github.com> <2H_Ol6dl2XhWKowrqvJbdAEFoHWYNu65dR60bjkIaPQ=.879025d0-d0bc-49d1-94e3-da69666c372c@github.com> Message-ID: On Thu, 6 Mar 2025 09:14:19 GMT, Bhavana Kilambi wrote: >> Thanks for your clarify @Bhavana-Kilambi . I agree with you that it may not make any difference on other machines. So do you suggest that I change the pattern right now, or revisit this part once we met the performance issue on other real-world workload? > > Sure, I am fine with going ahead with the current implementation and revisit if we encounter any performance issues. Thanks for testing. Have all the vectorAPI JTREG tests been tested on N1 and Grace? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1982980774 From xgong at openjdk.org Thu Mar 6 09:26:53 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 6 Mar 2025 09:26:53 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: <17arBLP__LxvP0MS9liI_o9HTVpzxDJrjy3LjYNn8Ng=.8be17d07-09b0-4d31-b41f-66d3c9be5fad@github.com> <2H_Ol6dl2XhWKowrqvJbdAEFoHWYNu65dR60bjkIaPQ=.879025d0-d0bc-49d1-94e3-da69666c372c@github.com> Message-ID: On Thu, 6 Mar 2025 09:15:29 GMT, Bhavana Kilambi wrote: > Have all the vectorAPI JTREG tests been tested on N1 and Grace? Yes, of cause. I tested all vector API relative jtreg tests both with NEON and SVE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1982995626 From bkilambi at openjdk.org Thu Mar 6 09:31:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 6 Mar 2025 09:31:54 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 01:18:57 GMT, Xiaohong Gong wrote: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... Looks good to me. ------------- Marked as reviewed by bkilambi (Author). PR Review: https://git.openjdk.org/jdk/pull/23790#pullrequestreview-2663857378 From xgong at openjdk.org Thu Mar 6 09:36:58 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 6 Mar 2025 09:36:58 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 09:28:48 GMT, Bhavana Kilambi wrote: > Looks good to me. Thanks for much for your review @Bhavana-Kilambi ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2703300006 From adinn at openjdk.org Thu Mar 6 10:14:59 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 6 Mar 2025 10:14:59 GMT Subject: Integrated: 8351256: Improve printing of runtime call stub names in disassember output In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 09:13:02 GMT, Andrew Dinn wrote: > Fixes printing of runtime stub call targets in disassembler listings. This pull request has now been integrated. Changeset: cfab88b1 Author: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/cfab88b1a2351a187bc1be153be96ca983a7776c Stats: 53 lines in 2 files changed: 51 ins; 0 del; 2 mod 8351256: Improve printing of runtime call stub names in disassember output Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/23915 From bulasevich at openjdk.org Thu Mar 6 12:15:52 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 6 Mar 2025 12:15:52 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: > This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache. > > OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data). > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - swap matadata and jvmci data in outputs according to data layout - cleanup - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description - add a separate adrp_movk function to to support targets located more than 4GB away - Force the use of movk in combination with adrp and ldr instructions to address scenarios where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. Fix: use _oops_size int16 field to calculate metadata offset - removing dead code - a bit of cleanup and addressing review suggestions - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup - ... and 5 more: https://git.openjdk.org/jdk/compare/cfab88b1...bc8c590c ------------- Changes: https://git.openjdk.org/jdk/pull/21276/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=13 Stats: 192 lines in 7 files changed: 87 ins; 37 del; 68 mod Patch: https://git.openjdk.org/jdk/pull/21276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21276/head:pull/21276 PR: https://git.openjdk.org/jdk/pull/21276 From mli at openjdk.org Thu Mar 6 12:21:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 6 Mar 2025 12:21:03 GMT Subject: RFR: 8351345: Test: Print out non-whitelisted JTreg VM or Javaoptions flag Message-ID: Hi, Can you help to review this trivial patch? Currently, it only reports that some flag is non-whitelisted, but does not print out the flag explicitly, but it's helpful to do so. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/23931/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23931&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351345 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23931.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23931/head:pull/23931 PR: https://git.openjdk.org/jdk/pull/23931 From mli at openjdk.org Thu Mar 6 13:23:29 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 6 Mar 2025 13:23:29 GMT Subject: RFR: 8351348: x86_64: remove redundant supports_float16 check Message-ID: Hi, Can you help to review this simple patch? `supports_float16()` is invoked in `is_intrinsic_available -> is_intrinsic_supported`, so there is no need to call it explicitly. Thanks ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/23932/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23932&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351348 Stats: 8 lines in 1 file changed: 0 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23932/head:pull/23932 PR: https://git.openjdk.org/jdk/pull/23932 From bulasevich at openjdk.org Thu Mar 6 13:30:00 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 6 Mar 2025 13:30:00 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Thu, 6 Mar 2025 12:15:52 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache. >> >> OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data). >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - swap matadata and jvmci data in outputs according to data layout > - cleanup > - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup > - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description > - add a separate adrp_movk function to to support targets located more than 4GB away > - Force the use of movk in combination with adrp and ldr instructions to address scenarios > where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp > - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: > _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. > Fix: use _oops_size int16 field to calculate metadata offset > - removing dead code > - a bit of cleanup and addressing review suggestions > - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup > - ... and 5 more: https://git.openjdk.org/jdk/compare/cfab88b1...bc8c590c > Please swap `matadata` and `jvmci data` in outputs ... > > Also please merge latest JDK which have SA cleanup related to compilers: #23782 Yes. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2703854539 From cushon at openjdk.org Thu Mar 6 16:05:38 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 6 Mar 2025 16:05:38 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v2] In-Reply-To: References: Message-ID: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. Liam Miller-Cushon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - copyright - style - Merge branch 'openjdk:master' into mernst/JDK-8350563 - RegTest - Merge branch 'openjdk:master' into mernst/JDK-8350563 - push `con->(cast*)->and` uses ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23871/files - new: https://git.openjdk.org/jdk/pull/23871/files/ded43c5e..a1d7826a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=00-01 Stats: 22846 lines in 760 files changed: 9209 ins; 9806 del; 3831 mod Patch: https://git.openjdk.org/jdk/pull/23871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23871/head:pull/23871 PR: https://git.openjdk.org/jdk/pull/23871 From cushon at openjdk.org Thu Mar 6 16:11:00 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 6 Mar 2025 16:11:00 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v2] In-Reply-To: References: Message-ID: <7rg1ERKAznG-pe3jqdi1Jg8Sfm2A4yaSopJQ2E2SAhM=.586d69ee-d823-4b9b-a727-a8505376c39d@github.com> On Thu, 6 Mar 2025 16:05:38 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - copyright > - style > - Merge branch 'openjdk:master' into mernst/JDK-8350563 > - RegTest > - Merge branch 'openjdk:master' into mernst/JDK-8350563 > - push `con->(cast*)->and` uses Updated to address review comments, and add a test ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2704290178 From cushon at openjdk.org Thu Mar 6 16:11:02 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 6 Mar 2025 16:11:02 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v2] In-Reply-To: References: Message-ID: <5VWfJqQaLA-6p7HSBGXF1MeeWODpBrQkXO-Nmhc70S4=.266a8629-b9b8-4f45-a890-51e9aeb2139b@github.com> On Thu, 6 Mar 2025 09:02:50 GMT, Emanuel Peter wrote: >> Liam Miller-Cushon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - copyright >> - style >> - Merge branch 'openjdk:master' into mernst/JDK-8350563 >> - RegTest >> - Merge branch 'openjdk:master' into mernst/JDK-8350563 >> - push `con->(cast*)->and` uses > > src/hotspot/share/opto/phaseX.cpp line 2022: > >> 2020: } >> 2021: }; >> 2022: to_push->visit_uses(push_and_uses_to_worklist, is_boundary); > > And why not just call this line in two places, rather than having to work with `to_push`? Would that not be less code? Matthias reports: That was purely a stylistic choice - the two cases can be integrated, we can push from "use" in both cases. Updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r1983652245 From dfenacci at openjdk.org Thu Mar 6 16:23:03 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 6 Mar 2025 16:23:03 GMT Subject: RFR: 8348645: IGV: visualize live ranges [v4] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 13:22:17 GMT, Roberto Casta?eda Lozano wrote: >> This changeset extends IGV with live range visualization. It introduces live ranges as first-class IGV entities and displays them along with the control-flow graph in the CFG view. Visualizing liveness information should hopefully make C2's register allocator easier to understand, diagnose, debug, and enhance. >> >> Live ranges are visible in C2 phases where liveness information is available, that is, phases `Initial liveness` to `Fix up spills` at IGV print level 4 or greater. For example, running a debug build of the JVM as follows: >> >> >> java -Xbatch -XX:CompileCommand=IGVPrintLevel,java.util.HashMap::newNode,4 >> >> >> produces the following visualization for the `Initial spilling` phase: >> >> ![initial-spilling](https://github.com/user-attachments/assets/1ecf74f5-92a8-4866-b1ec-2323bb0c428e) >> >> Live ranges are first-class IGV entities, meaning that the user can: >> >> - search, select, and extract them; >> >> ![search-extract](https://github.com/user-attachments/assets/8e0dfa59-457f-49cb-b2b5-1d202301c79d) >> >> - examine their properties in the `Properties` window or via tooltips; >> >> ![properties](https://github.com/user-attachments/assets/68d2d23b-b986-4d2e-835c-b661bce0de23) >> >> - navigate to related IGV entities via a pop-up menu; and >> >> ![popup](https://github.com/user-attachments/assets/21de2fef-d36a-42d5-b828-2696d87a18ea) >> >> - program filters that act om them according to their properties. >> >> ![filters](https://github.com/user-attachments/assets/e993b067-d0b8-452c-a885-c4e601e31e1c) >> >> Live ranges are connected to nodes by a use-def relation: a node can define zero or one live ranges, and use multiple live ranges; a live range can be defined and used by multiple nodes. Consequently, a live range in IGV is visible if and only if all its related nodes are visible (fully or semi-transparently). Generally, the start and end of a live range are vertically aligned with the nodes that first define and last use the live range. To reflect accurately the semantics of Phi nodes w.r.t. liveness, the visualization treats live ranges related by Phi nodes specially: live ranges used by a Phi node end at the bottom of the corresponding predecessor basic blocks, whereas live ranges defined by a Phi node start at the top of the node's basic block. The following screenshot shows an example of a Phi node (`48 Phi`) joining live ranges `L8` and `L13` into `L15`: >> >> ![phi](https://github.com/user-attachments/assets/0ef8aa1d-523d-4391-982e-6b74c2016a3c... > > Roberto Casta?eda Lozano has updated the pull request incrementally with three additional commits since the last revision: > > - Handle single-block CFGs > - Open and close live ranges joined by phis in their respective blocks > - Export liveness information when saving a graph from IGV Just a minor aesthetic thing: I noticed that in phases with no liveness information, the liveness information in each node is replaced by an empty space (instead of nothing): image instead of image Otherwise it looks good to me (I probably made more of a functionality rather than a code style/semantics kind of review). Thanks again @robcasloz. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/23558#pullrequestreview-2664983150 From jiangli at openjdk.org Thu Mar 6 17:57:57 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 6 Mar 2025 17:57:57 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK [v2] In-Reply-To: References: <2fluPJGNiu9SvOwq6MfyLch7lChTjPlOJh7dcrXxfa4=.55bdfd48-08dd-4f14-b683-88fccbc66e1e@github.com> <_mnGdermmAHz7Uc0ImXgngswbkF1RgXjycQKTQAiz9I=.da23ce88-3967-4891-9bd7-b2b0a04e76ab@github.com> Message-ID: On Thu, 6 Mar 2025 07:44:43 GMT, Alan Bateman wrote: >> I think the test still serves as a functional test: if an archive is specified, then -Xshare:auto is used by default. > > Would it be better to use @requires !jdk.static for now so this test isn't selected when testing static builds? There is work further down the line to figure out a story for static builds + AOT as the ultimate goal is to run jlink and produce a single executable so there wouldn't be a separate AOT cache/archive on the file system. It's much further down the road, not clear that it's worth trying to test intermediate tests with -Xshare:auto at this time. Looking back at the history, the `-Xshare:auto` & `-server` issue (https://bugs.openjdk.org/browse/JDK-8005933) was related to this code in JDK 8: https://github.com/openjdk/jdk8u-dev/commit/a1f3a95880318c32169ff89dc8784a0dfc629eec In JDK 8, on c2 sharedSpaces is disabled if the following condition is true. !DumpSharedSpaces && !RequireSharedSpaces && (FLAG_IS_DEFAULT(UseSharedSpaces) || !UseSharedSpaces) That code no longer exists in mainline. The `-server` option has become a no-op. The launcher code, `CheckJvmType` removes `-server` from the command line arguments passed to the VM side. As a result of those, `-server` is no real impact on CDS now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23881#discussion_r1983820332 From kvn at openjdk.org Thu Mar 6 18:32:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Mar 2025 18:32:01 GMT Subject: RFR: 8351348: x86_64: remove redundant supports_float16 check In-Reply-To: References: Message-ID: <57L7Sm9va-I6MRDFKHo3xKrL3Xz72RJcmllfGP8LliE=.cd6094a4-e3a7-4742-bdd6-1e00d0c5dfa3@github.com> On Thu, 6 Mar 2025 13:18:07 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > `supports_float16()` is invoked in `is_intrinsic_available -> is_intrinsic_supported`, so there is no need to call it explicitly. > > Thanks I am on fence for this. You can consider it as "short-cut" to quickly check availability of feature. It will be called 2 times without it (I am not sure calls will be inlined and optimized). ------------- PR Review: https://git.openjdk.org/jdk/pull/23932#pullrequestreview-2665306856 From kvn at openjdk.org Thu Mar 6 18:34:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Mar 2025 18:34:01 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Thu, 6 Mar 2025 13:27:45 GMT, Boris Ulasevich wrote: >> Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - swap matadata and jvmci data in outputs according to data layout >> - cleanup >> - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup >> - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description >> - add a separate adrp_movk function to to support targets located more than 4GB away >> - Force the use of movk in combination with adrp and ldr instructions to address scenarios >> where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp >> - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: >> _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. >> Fix: use _oops_size int16 field to calculate metadata offset >> - removing dead code >> - a bit of cleanup and addressing review suggestions >> - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup >> - ... and 5 more: https://git.openjdk.org/jdk/compare/cfab88b1...bc8c590c > >> Please swap `matadata` and `jvmci data` in outputs ... >> >> Also please merge latest JDK which have SA cleanup related to compilers: #23782 > > Yes. Thanks! @bulasevich is it ready for testing now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2704637811 From kvn at openjdk.org Thu Mar 6 18:56:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Mar 2025 18:56:51 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 22:23:23 GMT, Evgeny Astigeevich wrote: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... I think if nmethods have the same size there could be effects on CPU caches when reading instructions because of regular pattern in CodeCache they will have. May be it can be achieved by adding blobs of different sizes between nmethods. You already use blobs to align group's start address. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2704687818 From kvn at openjdk.org Thu Mar 6 19:03:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Mar 2025 19:03:54 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 22:23:23 GMT, Evgeny Astigeevich wrote: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... Hmm, I am wondering what effect you are measuring. You executing methods sequentially. Is it possible the effects you see is CodeCache pages loading. The more groups you have, the more space they take in CodeCache. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2704701515 From sparasa at openjdk.org Thu Mar 6 19:15:38 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 6 Mar 2025 19:15:38 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v5] In-Reply-To: References: Message-ID: > The goal of this PR is to generate code using APX NDD instructions. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Reorder to dst, src for negI and negL instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/c578c936..b09f1a01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From sparasa at openjdk.org Thu Mar 6 20:06:06 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 6 Mar 2025 20:06:06 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v6] In-Reply-To: References: Message-ID: > The goal of this PR is to generate code using APX NDD instructions. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove APX support when bmi2 support is absent ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/b09f1a01..be13918d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=04-05 Stats: 150 lines in 1 file changed: 0 ins; 140 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From iklam at openjdk.org Thu Mar 6 20:37:55 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 6 Mar 2025 20:37:55 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK [v2] In-Reply-To: References: <2fluPJGNiu9SvOwq6MfyLch7lChTjPlOJh7dcrXxfa4=.55bdfd48-08dd-4f14-b683-88fccbc66e1e@github.com> <_mnGdermmAHz7Uc0ImXgngswbkF1RgXjycQKTQAiz9I=.da23ce88-3967-4891-9bd7-b2b0a04e76ab@github.com> Message-ID: On Thu, 6 Mar 2025 17:55:28 GMT, Jiangli Zhou wrote: >> Would it be better to use @requires !jdk.static for now so this test isn't selected when testing static builds? There is work further down the line to figure out a story for static builds + AOT as the ultimate goal is to run jlink and produce a single executable so there wouldn't be a separate AOT cache/archive on the file system. It's much further down the road, not clear that it's worth trying to test intermediate tests with -Xshare:auto at this time. > > Looking back at the history, the `-Xshare:auto` & `-server` issue (https://bugs.openjdk.org/browse/JDK-8005933) was related to this code in JDK 8: > > https://github.com/openjdk/jdk8u-dev/commit/a1f3a95880318c32169ff89dc8784a0dfc629eec > > In JDK 8, on c2 sharedSpaces is disabled if the following condition is true. > > > !DumpSharedSpaces && !RequireSharedSpaces && > (FLAG_IS_DEFAULT(UseSharedSpaces) || !UseSharedSpaces) > > > That code no longer exists in mainline. The `-server` option has become a no-op. The launcher code, `CheckJvmType` removes `-server` from the command line arguments passed to the VM side. As a result of those, `-server` is no real impact on CDS now. For this particular test, I think it's OK to remove `-server` (so it won't fail with static JDK) and ` * @bug 8005933` (as the condition related to that particular bug no longer exists). But we should keep the test for the purpose "-Xshare:auto is the default when -Xshare is not specified" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23881#discussion_r1984015402 From kxu at openjdk.org Thu Mar 6 21:45:28 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 6 Mar 2025 21:45:28 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v5] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:29:00 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> remove tri-conditionals > > This looks interesting! Thanks @tabjy for the work! > > For now, I just have some drive-through comments about testing. > > Also: would it make sense to have a JMH benchmark to prove that this code change is beneficial enough to warrant the additional complexity? Added randomized tests per @eme64's suggestions. I'm still looking into micro benchmark with JMH > test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 291: > >> 289: return i + (x << i) + i; // Expects 64 + 63 + 64 = 191 >> 290: } >> 291: } > > Would it make sense to add some randomized patterns, just for result verification? > > You can use `Generators.java` to get interesting values. Of course that would mean not doing IR verification, but at least it would give us better confidence that the values are correct. > > I'm imagining expressions like this: > `return a * CON1 + a * CON2 + a * CON3 + a * CON4` > > Where the CON are defined as a `public static final` field with a random value generated by `Generators`. > > The advantage of using `Generators` is that it generates powers-of-two more frequently, which seems to be relevant here. That's a very valid suggestion. Thanks. I updated the test. Please take a look if I understood your intention correctly. However, it does look like this `a * CON1 + a * CON2 + ...` pattern is picked up by existing the associative optimization first: https://github.com/openjdk/jdk/blob/a23fb0af65f491ef655ba114fcc8032a09a55213/src/hotspot/share/opto/addnode.cpp#L345-L377 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2705006453 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1984090578 From kxu at openjdk.org Thu Mar 6 21:45:28 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 6 Mar 2025 21:45:28 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v6] In-Reply-To: References: Message-ID: <6OubCU2RJp0iYbQmFllfRFn0bltMg3sHHxiFpc-g678=.9e94feb4-499b-4ac7-b74c-ce6d00e317dd@github.com> > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: add randomized power of two addition tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/358bcbac..5b972e9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=04-05 Stats: 47 lines in 2 files changed: 45 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From chagedorn at openjdk.org Thu Mar 6 21:54:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 6 Mar 2025 21:54:58 GMT Subject: RFR: 8351345: Test: Print out non-whitelisted JTreg VM or Javaoptions flag In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 12:16:57 GMT, Hamlin Li wrote: > Hi, > Can you help to review this trivial patch? > Currently, it only reports that some flag is non-whitelisted, but does not print out the flag explicitly, but it's helpful to do so. > > Thanks! test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 752: > 750: if (!flag.startsWith("-D") && !flag.startsWith("-e") && JTREG_WHITELIST_FLAGS.stream().noneMatch(flag::contains)) { > 751: // Found VM flag that is not whitelisted > 752: System.out.println("Non-whitelisted JTreg VM or Javaoptions flag: " + flag); That's a good idea! Is the intention to just report the first non-whitelisted flag found or all of them? I guess just one of them is fine to indicate the reason for not performing IR matching (could be verbose to report all non-whitelisted flags otherwise). I would merge this message with the already existing one here and remove that one in favor of the new one: https://github.com/openjdk/jdk/blob/a23fb0af65f491ef655ba114fcc8032a09a55213/test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java#L612-L615 Another thought while cleaning this up: We could also improve this message https://github.com/openjdk/jdk/blob/a23fb0af65f491ef655ba114fcc8032a09a55213/test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java#L597-L602 and split it into three separate bailouts + messages. Could be part of this RFE (you could then set a new title for the issue to something like `[IR Framework] Improve reported disabled IR verification messages`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23931#discussion_r1984095845 From eastigeevich at openjdk.org Thu Mar 6 22:33:57 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 6 Mar 2025 22:33:57 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: <12XVxX3jh5YAN_AXwNYOLJKZfPpmogU_Dcg36Vk7m40=.594da04d-9a08-4856-8934-a6a030cfe1b0@github.com> On Thu, 6 Mar 2025 19:01:03 GMT, Vladimir Kozlov wrote: > Hmm, I am wondering what effect you are measuring. You executing methods sequentially. Is it possible the effects you see is CodeCache pages loading. The more groups you have, the more space they take in CodeCache. I don't know what's happening on Intel. It's an open question. Maybe someone from Intel might explain regressions. Or I'll try to figure out it later. On Neoverse: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/neoverse-v1-platform-a-new-performance-tier-for-arm: > **Front-end ? Branch prediction and stalls** > > Another significant change is doubling the number of concurrent code regions tracked in the front-end of the design which results in a large speedup for Java-type workloads with sparse code regions. Net result of all these improvements is up to 90% reduction in branch mis-predicts (for BTB misses) and up to 50% reduction in front-end stalls for common server/HPC workloads, compared to Neoverse N1 CPU. You can see Neoverse front-end is limited to how many code regions it can handle without stalls. Don't mix up with OS pages. I measure an effect of code being in many different regions. When all nmethods are in the minimum number of code regions. Calls of them have no penalties. As soon as we spread nmethods among many code regions, code becomes sparse and we start getting stalls. > Is it possible the effects you see is CodeCache pages loading. The more groups you have, the more space they take in CodeCache. Yes, the percent of iTLB-load-misses increased but it's low: changed from 0.5% to 0.6%. The main issue is too many code regions. > May be it can be achieved by adding blobs of different sizes between nmethods. You already use blobs to align group's start address. This will only make code more sparse. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2705085399 PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2705088419 From jiangli at openjdk.org Thu Mar 6 23:32:55 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 6 Mar 2025 23:32:55 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK [v2] In-Reply-To: References: <2fluPJGNiu9SvOwq6MfyLch7lChTjPlOJh7dcrXxfa4=.55bdfd48-08dd-4f14-b683-88fccbc66e1e@github.com> <_mnGdermmAHz7Uc0ImXgngswbkF1RgXjycQKTQAiz9I=.da23ce88-3967-4891-9bd7-b2b0a04e76ab@github.com> Message-ID: On Thu, 6 Mar 2025 07:44:43 GMT, Alan Bateman wrote: >> I think the test still serves as a functional test: if an archive is specified, then -Xshare:auto is used by default. > > Would it be better to use @requires !jdk.static for now so this test isn't selected when testing static builds? There is work further down the line to figure out a story for static builds + AOT as the ultimate goal is to run jlink and produce a single executable so there wouldn't be a separate AOT cache/archive on the file system. It's much further down the road, not clear that it's worth trying to test intermediate tests with -Xshare:auto at this time. Thanks. I agree with keeping the test. That's also related to @AlanBateman comments on if the test needs to be skipped for static JDK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23881#discussion_r1984185761 From kvn at openjdk.org Fri Mar 7 00:49:30 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Mar 2025 00:49:30 GMT Subject: RFR: 8348261: assert(n->is_Mem()) failed: memory node required Message-ID: Add missing check for StrInflatedCopy intrinsic in C2 Escape Analysis. Very rare case since we not usually use Latin1.inflate(). In failing case we inline both paths in [String.getBytes()](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L4808) and eliminate `TreeMap$EntryIterator allocation: # java.lang.String::getBytes @ bci:40 (line 4812) L[0]=rsp + #64 L[1]=rsp + #132 L[2]=rsp + #120 L[3]=rsp + #124 STK[0]=rsp + #128 STK[1]=#0 STK[2]=rsp + #132 STK[3]=rsp + #120 STK[4]=rsp + #164 # java.lang.AbstractStringBuilder::putStringAt @ bci:15 (line 1754) L[0]=rsp + #0 L[1]=rsp + #120 L[2]=rsp + #64 # java.lang.AbstractStringBuilder::append @ bci:30 (line 592) L[0]=rsp + #0 L[1]=rsp + #64 L[2]=rsp + #28 # java.lang.StringBuilder::append @ bci:2 (line 179) L[0]=rsp + #0 L[1]=rsp + #64 # java.lang.StringBuilder::append @ bci:5 (line 173) L[0]=rsp + #0 L[1]=rsp + #32 # sun.util.locale.LocaleExtensions::toID @ bci:100 (line 206) L[0]=RBP L[1]=rsp + #0 L[2]=rsp + #8 L[3]=#ScObj0 L[4]=rsp + #16 L[5]=rsp + #24 L[6]=rsp + #32 # ScObj0 java/util/TreeMap$EntryIterator={ [expectedModCount :0]=rsp + #140, [next :1]=rsp + #176, [lastReturned :2]=rsp + #16, [this$0 :3]=rsp + #168 } Unfortunately I was not able to create standalone test - it seems requires very particular frequencies of executed paths and used features/flags. The fix was verified with compilation replay file from the bug report. I am running testing and will let you know results. ------------- Commit messages: - assert(n->is_Mem()) failed: memory node required Changes: https://git.openjdk.org/jdk/pull/23938/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23938&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8348261 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23938.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23938/head:pull/23938 PR: https://git.openjdk.org/jdk/pull/23938 From xgong at openjdk.org Fri Mar 7 02:16:53 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 7 Mar 2025 02:16:53 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: <0IbdTO_6bZbOFLdcyBOLJcQzDWjYk2b70czwKJ8TN8c=.3895e38e-ec11-4c8f-b772-b9143bbc5791@github.com> On Wed, 26 Feb 2025 01:18:57 GMT, Xiaohong Gong wrote: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... Hi @theRealAph , could you please help take a look at this PR? Any feedback is welcome. Thanks a lot in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2705360833 From sviswanathan at openjdk.org Fri Mar 7 02:24:23 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 7 Mar 2025 02:24:23 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input Message-ID: Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. Review comments are welcome. Best Regards, Sandhya ------------- Commit messages: - whitespace - some updates - 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input Changes: https://git.openjdk.org/jdk/pull/23939/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23939&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350835 Stats: 137 lines in 2 files changed: 136 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23939.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23939/head:pull/23939 PR: https://git.openjdk.org/jdk/pull/23939 From kvn at openjdk.org Fri Mar 7 05:27:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Mar 2025 05:27:51 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 01:56:49 GMT, Sandhya Viswanathan wrote: > Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. > > Review comments are welcome. > > Best Regards, > Sandhya Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23939#pullrequestreview-2666268567 From chagedorn at openjdk.org Fri Mar 7 07:27:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 07:27:52 GMT Subject: RFR: 8348261: assert(n->is_Mem()) failed: memory node required In-Reply-To: References: Message-ID: <5SpyYxNBkzrHmsTQMJPRiCwUMNmuAhInSO3fDG0vqcY=.9aeedc34-ea62-4086-8866-05805429c2ea@github.com> On Fri, 7 Mar 2025 00:44:44 GMT, Vladimir Kozlov wrote: > Add missing check for StrInflatedCopy intrinsic in C2 Escape Analysis. > > Very rare case since we not usually use Latin1.inflate(). In failing case we inline both paths in [String.getBytes()](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L4808) and eliminate `TreeMap$EntryIterator allocation: > > > # java.lang.String::getBytes @ bci:40 (line 4812) L[0]=rsp + #64 L[1]=rsp + #132 L[2]=rsp + #120 L[3]=rsp + #124 STK[0]=rsp + #128 STK[1]=#0 STK[2]=rsp + #132 STK[3]=rsp + #120 STK[4]=rsp + #164 > # java.lang.AbstractStringBuilder::putStringAt @ bci:15 (line 1754) L[0]=rsp + #0 L[1]=rsp + #120 L[2]=rsp + #64 > # java.lang.AbstractStringBuilder::append @ bci:30 (line 592) L[0]=rsp + #0 L[1]=rsp + #64 L[2]=rsp + #28 > # java.lang.StringBuilder::append @ bci:2 (line 179) L[0]=rsp + #0 L[1]=rsp + #64 > # java.lang.StringBuilder::append @ bci:5 (line 173) L[0]=rsp + #0 L[1]=rsp + #32 > # sun.util.locale.LocaleExtensions::toID @ bci:100 (line 206) L[0]=RBP L[1]=rsp + #0 L[2]=rsp + #8 L[3]=#ScObj0 L[4]=rsp + #16 L[5]=rsp + #24 L[6]=rsp + #32 > # ScObj0 java/util/TreeMap$EntryIterator={ [expectedModCount :0]=rsp + #140, [next :1]=rsp + #176, [lastReturned :2]=rsp + #16, [this$0 :3]=rsp + #168 } > > > Unfortunately I was not able to create standalone test - it seems requires very particular frequencies of executed paths and used features/flags. The fix was verified with compilation replay file from the bug report. > > I am running testing and will let you know results. Looks good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23938#pullrequestreview-2666457820 From jbhateja at openjdk.org Fri Mar 7 07:58:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 7 Mar 2025 07:58:54 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 01:56:49 GMT, Sandhya Viswanathan wrote: > Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. > > Review comments are welcome. > > Best Regards, > Sandhya Hi @sviswa7, Fix looks resonable to me. Kindly consider including some suggestions. Best Regards test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 62: > 60: } > 61: > 62: @Test Suggestion: @Test @IR(failOn = { IRNode.VECTOR_CAST_HF2F }, applyIfCPUFeatureOr = { "avx512vl", "true", "f16c", "true" }) test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 71: > 69: } > 70: > 71: @Test Suggestion: @Test @IR(counts = { IRNode.VECTOR_CAST_HF2F, " >0 " }, applyIfCPUFeatureOr = { "avx512vl", "true", "f16c", "true" }) test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 80: > 78: } > 79: > 80: @Test Suggestion: /* * C2 handles i2s conversion by constraining the value range of the integral argument; thus * argument fed to ConvHF2F is of type T_INT. Fix for JDK-8350835 skips over vectorizing such a case * for now. */ @Test @IR(failOn = { IRNode.VECTOR_CAST_HF2F }, applyIfCPUFeatureOr = { "avx512vl", "true", "f16c", "true" }) test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 89: > 87: } > 88: > 89: @Test ``` suggestion /* * C2 handles this in two steps: l2i handling creates ConvL2I IR ,followed by i2s conversion which onstrains the * value range of the integral argument; thus, the argument fed to ConvHF2F is of type T_INT. Fix for * JDK-8350835 skip over vectorizing such a case for now. */ @Test @IR(failOn = { IRNode.VECTOR_CAST_HF2F }, applyIfCPUFeatureOr = { "avx512vl", "true", "f16c", "true" }) ------------- PR Review: https://git.openjdk.org/jdk/pull/23939#pullrequestreview-2666472302 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1984601608 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1984602177 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1984603876 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1984605156 From chagedorn at openjdk.org Fri Mar 7 08:11:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 08:11:26 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly Message-ID: This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). ### Update Assertion Predicate Killing Mechanism The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. #### Mark Opaque*AssertionPredicate` Nodes Useless Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. #### Update Predicate Iteration Code To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. #### Other Updates I've also applied some small refactorings of touched code. Thanks, Christian ------------- Commit messages: - 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly Changes: https://git.openjdk.org/jdk/pull/23941/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23941&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351280 Stats: 205 lines in 6 files changed: 134 ins; 18 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/23941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23941/head:pull/23941 PR: https://git.openjdk.org/jdk/pull/23941 From chagedorn at openjdk.org Fri Mar 7 08:11:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 08:11:27 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 08:01:13 GMT, Christian Hagedorn wrote: > This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). > > ### Update Assertion Predicate Killing Mechanism > The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. > > #### Mark Opaque*AssertionPredicate` Nodes Useless > Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: > - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. > - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. > - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. > - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. > > #### Update Predicate Iteration Code > To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: > - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. > > #### Other Updates > I've also applied some small refactorings of touched code. > > Thanks, > Christian src/hotspot/share/opto/opaquenode.cpp line 115: > 113: // into the bool input of an If node and can thus be replaced by true to let the Template Assertion Predicate be > 114: // folded away (the success path is always the true path by design). > 115: return phase->intcon(1); Moved to `Value()`. There is also no guarantee that this is an existing node (though `ConI #1` should probably always exist at that point). src/hotspot/share/opto/opaquenode.cpp line 157: > 155: > 156: #ifndef PRODUCT > 157: void OpaqueInitializedAssertionPredicateNode::dump_spec(outputStream* st) const { Added dump for better debugging, also done for `OpaqueTemplateAssertionPredicateNode`. src/hotspot/share/opto/opaquenode.hpp line 173: > 171: > 172: virtual int Opcode() const; > 173: virtual uint size_of() const { return sizeof(*this); } Required since we add a first field to the class. src/hotspot/share/opto/predicates.cpp line 228: > 226: OpaqueTemplateAssertionPredicateNode* opaque_node = this->opaque_node(); > 227: opaque_node->mark_useless(); > 228: igvn._worklist.push(opaque_node); Mark instead of replacement by constant directly. src/hotspot/share/opto/predicates.cpp line 1113: > 1111: if (initialized_assertion_predicate.is_last_value()) { > 1112: // Only Last Value Initialized Assertion Predicates need to be killed and updated. > 1113: initialized_assertion_predicate.kill(_phase->igvn()); Needed to move this method to the source file because of incomplete `PhaseIdealLoop` type when trying to access `igvn()`. src/hotspot/share/opto/predicates.hpp line 692: > 690: if (process_initialized_assertion_predicate(predicate_visitor)) { > 691: continue; > 692: } Refactoring: Split into methods and each method check if the opaque node is marked useless before visiting. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984618146 PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984620892 PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984621379 PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984622086 PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984622636 PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984623991 From epeter at openjdk.org Fri Mar 7 09:08:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 09:08:54 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 08:01:13 GMT, Christian Hagedorn wrote: > This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). > > ### Update Assertion Predicate Killing Mechanism > The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. > > #### Mark Opaque*AssertionPredicate` Nodes Useless > Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: > - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. > - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. > - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. > - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. > > #### Update Predicate Iteration Code > To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: > - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. > > #### Other Updates > I've also applied some small refactorings of touched code. > > Thanks, > Christian Looks good, I just had some minor questions / suggestions :) src/hotspot/share/opto/opaquenode.cpp line 142: > 140: bool OpaqueInitializedAssertionPredicateNode::cmp(const Node &n) const { > 141: return _useless == n.as_OpaqueInitializedAssertionPredicate()->is_useless(); > 142: } Ah interesting. But do we ever want to common these nodes? If we never want to common them, you could just fix it with the hash, right? 35 class Opaque1Node : public Node { 36 virtual uint hash() const ; // { return NO_HASH; } src/hotspot/share/opto/opaquenode.cpp line 154: > 152: _useless = true; > 153: igvn._worklist.push(this); > 154: } Here you directly push to worklist. And for `OpaqueTemplateAssertionPredicateNode` you seem to do it at the call-site. We should probably unify this. If you want to push inside `mark_useless`, then you should probably adjust the code for `OpaqueMultiversioning` as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/23941#pullrequestreview-2666631783 PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984686387 PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984691481 From epeter at openjdk.org Fri Mar 7 09:08:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 09:08:55 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 08:59:18 GMT, Emanuel Peter wrote: >> This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). >> >> ### Update Assertion Predicate Killing Mechanism >> The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. >> >> #### Mark Opaque*AssertionPredicate` Nodes Useless >> Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: >> - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. >> - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. >> - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. >> - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. >> >> #### Update Predicate Iteration Code >> To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: >> - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. >> >> #### Other Updates >> I've also applied some small refactorings of touched code. >> >> Thanks, >> Christian > > src/hotspot/share/opto/opaquenode.cpp line 154: > >> 152: _useless = true; >> 153: igvn._worklist.push(this); >> 154: } > > Here you directly push to worklist. And for `OpaqueTemplateAssertionPredicateNode` you seem to do it at the call-site. We should probably unify this. If you want to push inside `mark_useless`, then you should probably adjust the code for `OpaqueMultiversioning` as well. Pushing inside of `mark_useless` has the advantage that one cannot forget pushing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984692456 From epeter at openjdk.org Fri Mar 7 09:08:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 09:08:56 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 08:06:16 GMT, Christian Hagedorn wrote: >> This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). >> >> ### Update Assertion Predicate Killing Mechanism >> The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. >> >> #### Mark Opaque*AssertionPredicate` Nodes Useless >> Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: >> - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. >> - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. >> - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. >> - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. >> >> #### Update Predicate Iteration Code >> To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: >> - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. >> >> #### Other Updates >> I've also applied some small refactorings of touched code. >> >> Thanks, >> Christian > > src/hotspot/share/opto/predicates.cpp line 228: > >> 226: OpaqueTemplateAssertionPredicateNode* opaque_node = this->opaque_node(); >> 227: opaque_node->mark_useless(); >> 228: igvn._worklist.push(opaque_node); > > Mark instead of replacement by constant directly. Here you push to worklist at call-site. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984695504 From epeter at openjdk.org Fri Mar 7 09:15:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 09:15:53 GMT Subject: RFR: 8348261: assert(n->is_Mem()) failed: memory node required In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 00:44:44 GMT, Vladimir Kozlov wrote: > Add missing check for StrInflatedCopy intrinsic in C2 Escape Analysis. > > Very rare case since we not usually use Latin1.inflate(). In failing case we inline both paths in [String.getBytes()](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L4808) and eliminate `TreeMap$EntryIterator allocation: > > > # java.lang.String::getBytes @ bci:40 (line 4812) L[0]=rsp + #64 L[1]=rsp + #132 L[2]=rsp + #120 L[3]=rsp + #124 STK[0]=rsp + #128 STK[1]=#0 STK[2]=rsp + #132 STK[3]=rsp + #120 STK[4]=rsp + #164 > # java.lang.AbstractStringBuilder::putStringAt @ bci:15 (line 1754) L[0]=rsp + #0 L[1]=rsp + #120 L[2]=rsp + #64 > # java.lang.AbstractStringBuilder::append @ bci:30 (line 592) L[0]=rsp + #0 L[1]=rsp + #64 L[2]=rsp + #28 > # java.lang.StringBuilder::append @ bci:2 (line 179) L[0]=rsp + #0 L[1]=rsp + #64 > # java.lang.StringBuilder::append @ bci:5 (line 173) L[0]=rsp + #0 L[1]=rsp + #32 > # sun.util.locale.LocaleExtensions::toID @ bci:100 (line 206) L[0]=RBP L[1]=rsp + #0 L[2]=rsp + #8 L[3]=#ScObj0 L[4]=rsp + #16 L[5]=rsp + #24 L[6]=rsp + #32 > # ScObj0 java/util/TreeMap$EntryIterator={ [expectedModCount :0]=rsp + #140, [next :1]=rsp + #176, [lastReturned :2]=rsp + #16, [this$0 :3]=rsp + #168 } > > > Unfortunately I was not able to create standalone test - it seems requires very particular frequencies of executed paths and used features/flags. The fix was verified with compilation replay file from the bug report. > > I am running testing and will let you know results. Looks reasonable. Is there maybe some kind of stress-flag we could develop? Because you are saying the reproduction depends on some specific probabilities. src/hotspot/share/opto/escape.cpp line 4725: > 4723: } else { > 4724: #ifdef ASSERT > 4725: if (!n->is_Mem()) { Would it make sense to turn this into a product check? Could we bail-out gracefully at this point? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23938#pullrequestreview-2666669850 PR Review Comment: https://git.openjdk.org/jdk/pull/23938#discussion_r1984711046 From chagedorn at openjdk.org Fri Mar 7 09:23:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 09:23:53 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 09:00:00 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/opaquenode.cpp line 154: >> >>> 152: _useless = true; >>> 153: igvn._worklist.push(this); >>> 154: } >> >> Here you directly push to worklist. And for `OpaqueTemplateAssertionPredicateNode` you seem to do it at the call-site. We should probably unify this. If you want to push inside `mark_useless`, then you should probably adjust the code for `OpaqueMultiversioning` as well. > > Pushing inside of `mark_useless` has the advantage that one cannot forget pushing. Forgot to comment on that. With the next PR, I'm updating the elimination of Template Assertion Predicates and thus require to call `mark_useless()` in a first step to mark them all useless. Afterwards, I'm marking those non-useless again that can be found from loops. Thus, we do not want to always add all nodes to the worklist when calling `mark_useless()` if they are not going to be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984720518 From dfenacci at openjdk.org Fri Mar 7 09:26:56 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 7 Mar 2025 09:26:56 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v5] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 07:54:34 GMT, Marc Chevalier wrote: >> This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. >> >> Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. >> >> This also works for multiplications by powers of 2 since they are already translated into shifts. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > + comment on why not zerocon I just left one more small comment. Otherwise it looks good to me. Thanks @marc-chevalier! test/hotspot/jtreg/compiler/c2/irTests/LShiftINodeIdealizationTests.java line 178: > 176: > 177: short[] arr = new short[1]; > 178: arr[0] = (short)1; What do you think about using random short value here? (I was just thinking it might slightly increase the chances of spotting if something is wrong with the shifts...) ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/23728#pullrequestreview-2666681057 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1984716376 From chagedorn at openjdk.org Fri Mar 7 09:29:31 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 09:29:31 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly [v2] In-Reply-To: References: Message-ID: > This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). > > ### Update Assertion Predicate Killing Mechanism > The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. > > #### Mark Opaque*AssertionPredicate` Nodes Useless > Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: > - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. > - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. > - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. > - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. > > #### Update Predicate Iteration Code > To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: > - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. > > #### Other Updates > I've also applied some small refactorings of touched code. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Make OpaqueInitializedAssertionPredicateNode NO_HASH ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23941/files - new: https://git.openjdk.org/jdk/pull/23941/files/c76f5044..aa88ded9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23941&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23941&range=00-01 Stats: 10 lines in 2 files changed: 4 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23941/head:pull/23941 PR: https://git.openjdk.org/jdk/pull/23941 From chagedorn at openjdk.org Fri Mar 7 09:29:31 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 09:29:31 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly [v2] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 08:55:47 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Make OpaqueInitializedAssertionPredicateNode NO_HASH > > src/hotspot/share/opto/opaquenode.cpp line 142: > >> 140: bool OpaqueInitializedAssertionPredicateNode::cmp(const Node &n) const { >> 141: return _useless == n.as_OpaqueInitializedAssertionPredicate()->is_useless(); >> 142: } > > Ah interesting. But do we ever want to common these nodes? > If we never want to common them, you could just fix it with the hash, right? > > 35 class Opaque1Node : public Node { > 36 virtual uint hash() const ; // { return NO_HASH; } I thought it does not matter if they common up. But maybe it does in some edge cases when we have two Initialized Assertion Predicates have that the very same condition but one should be removed. Anyway, might be safer to just not allow it. I think the effect should be negligible in terms of performance/footprint. Pushed an update ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984729158 From epeter at openjdk.org Fri Mar 7 09:41:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 09:41:55 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away during IGVN In-Reply-To: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Thu, 27 Feb 2025 13:07:46 GMT, Christian Hagedorn wrote: > The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). > > Note that most of the line changes are from new tests. > > ### The Problem > There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. > > #### An Inner Loop without Parse Predicates > The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: > > ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) > > We only have Parse Predicates for the outer loop. Why? > > Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: > > ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) > > In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. > > For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. > > #### Predicates from a Folded Loop End up at Another Loop > As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is only running for a single iteration and the bac... Generally looks good, I have a few suggestions and questions :) src/hotspot/share/opto/loopnode.cpp line 2517: > 2515: KillTemplateAssertionPredicates kill_template_assertion_predicates(phase->is_IterGVN()); > 2516: PredicateIterator predicate_iterator(skip_strip_mined()->in(EntryControl)); > 2517: predicate_iterator.for_each(kill_template_assertion_predicates); Suggestion: KillTemplateAssertionPredicateVisitor kill_template_assertion_predicate_visitor(phase->is_IterGVN()); PredicateIterator predicate_iterator(skip_strip_mined()->in(EntryControl)); predicate_iterator.for_each(kill_template_assertion_predicate_visitor); Nit: `KillTemplateAssertionPredicateVisitor` might be nicer because it tells me from the beginning that it is a visitor. `KillTemplateAssertionPredicates` had me thinking that is some kind of constructor that goes ahead and kills things already. The plural "predicates" also indicated that it would do that for all predicates. src/hotspot/share/opto/predicates.cpp line 62: > 60: return false; > 61: } > 62: return has_assertion_predicate_opaque_or_con_input(maybe_success_proj); Quick control question: Could skipping over constants also mean that we traverse up too far at some point? Like skipping not not just the predicates but also an unrelated check that is about to constant fold? I suppose that should not really create issues? src/hotspot/share/opto/predicates.hpp line 1230: > 1228: // The visitor visits all Template Assertion Predicates and kills them by marking them useless. They will be removed > 1229: // during next round of IGVN. > 1230: class KillTemplateAssertionPredicates : public PredicateVisitor { Suggestion: class KillTemplateAssertionPredicateVisitor : public PredicateVisitor { src/hotspot/share/opto/predicates.hpp line 1232: > 1230: class KillTemplateAssertionPredicates : public PredicateVisitor { > 1231: PhaseIterGVN* _igvn; > 1232: public: Suggestion: public: test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java line 105: > 103: * @test id=NoFlags > 104: * @bug 8288981 8350579 > 105: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+AbortVMOnCompilationFailure Can you explain why you are enabling `AbortVMOnCompilationFailure`? test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java line 159: > 157: > 158: // Runs most of the tests except the really time-consuming ones. > 159: static void runAllTests() { Sounds like a bit of a contradiction ? `runAllTests` -> `runAllFastTests`? ------------- PR Review: https://git.openjdk.org/jdk/pull/23823#pullrequestreview-2666689146 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r1984724182 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r1984734437 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r1984721363 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r1984740043 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r1984743343 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r1984746128 From epeter at openjdk.org Fri Mar 7 09:41:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 09:41:56 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away during IGVN In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Fri, 7 Mar 2025 09:37:16 GMT, Emanuel Peter wrote: >> The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). >> >> Note that most of the line changes are from new tests. >> >> ### The Problem >> There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. >> >> #### An Inner Loop without Parse Predicates >> The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) >> >> We only have Parse Predicates for the outer loop. Why? >> >> Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) >> >> In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. >> >> For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. >> >> #### Predicates from a Folded Loop End up at Another Loop >> As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is... > > test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java line 159: > >> 157: >> 158: // Runs most of the tests except the really time-consuming ones. >> 159: static void runAllTests() { > > Sounds like a bit of a contradiction ? > > `runAllTests` -> `runAllFastTests`? Which ones are the really time-consuming ones? And why do you not run them here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r1984748280 From rcastanedalo at openjdk.org Fri Mar 7 09:48:35 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 7 Mar 2025 09:48:35 GMT Subject: RFR: 8348645: IGV: visualize live ranges [v5] In-Reply-To: References: Message-ID: > This changeset extends IGV with live range visualization. It introduces live ranges as first-class IGV entities and displays them along with the control-flow graph in the CFG view. Visualizing liveness information should hopefully make C2's register allocator easier to understand, diagnose, debug, and enhance. > > Live ranges are visible in C2 phases where liveness information is available, that is, phases `Initial liveness` to `Fix up spills` at IGV print level 4 or greater. For example, running a debug build of the JVM as follows: > > > java -Xbatch -XX:CompileCommand=IGVPrintLevel,java.util.HashMap::newNode,4 > > > produces the following visualization for the `Initial spilling` phase: > > ![initial-spilling](https://github.com/user-attachments/assets/1ecf74f5-92a8-4866-b1ec-2323bb0c428e) > > Live ranges are first-class IGV entities, meaning that the user can: > > - search, select, and extract them; > > ![search-extract](https://github.com/user-attachments/assets/8e0dfa59-457f-49cb-b2b5-1d202301c79d) > > - examine their properties in the `Properties` window or via tooltips; > > ![properties](https://github.com/user-attachments/assets/68d2d23b-b986-4d2e-835c-b661bce0de23) > > - navigate to related IGV entities via a pop-up menu; and > > ![popup](https://github.com/user-attachments/assets/21de2fef-d36a-42d5-b828-2696d87a18ea) > > - program filters that act om them according to their properties. > > ![filters](https://github.com/user-attachments/assets/e993b067-d0b8-452c-a885-c4e601e31e1c) > > Live ranges are connected to nodes by a use-def relation: a node can define zero or one live ranges, and use multiple live ranges; a live range can be defined and used by multiple nodes. Consequently, a live range in IGV is visible if and only if all its related nodes are visible (fully or semi-transparently). Generally, the start and end of a live range are vertically aligned with the nodes that first define and last use the live range. To reflect accurately the semantics of Phi nodes w.r.t. liveness, the visualization treats live ranges related by Phi nodes specially: live ranges used by a Phi node end at the bottom of the corresponding predecessor basic blocks, whereas live ranges defined by a Phi node start at the top of the node's basic block. The following screenshot shows an example of a Phi node (`48 Phi`) joining live ranges `L8` and `L13` into `L15`: > > ![phi](https://github.com/user-attachments/assets/0ef8aa1d-523d-4391-982e-6b74c2016a3c) > > The changeset extends the IGV graph printing logic in HotSpot t... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Show liveness info extra line only when liveness information is available ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23558/files - new: https://git.openjdk.org/jdk/pull/23558/files/51718b90..efbde14a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23558&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23558&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23558.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23558/head:pull/23558 PR: https://git.openjdk.org/jdk/pull/23558 From rcastanedalo at openjdk.org Fri Mar 7 09:48:35 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 7 Mar 2025 09:48:35 GMT Subject: RFR: 8348645: IGV: visualize live ranges [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 16:20:28 GMT, Damon Fenacci wrote: > Just a minor aesthetic thing: I noticed that in phases with no liveness information, the liveness information in each node is replaced by an empty space (instead of nothing) Good catch, thanks Damon! Commit efbde14a should address that, please check that it works as you expect. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23558#issuecomment-2705989370 From mli at openjdk.org Fri Mar 7 09:56:52 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Mar 2025 09:56:52 GMT Subject: RFR: 8351348: x86_64: remove redundant supports_float16 check In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 13:18:07 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > `supports_float16()` is invoked in `is_intrinsic_available -> is_intrinsic_supported`, so there is no need to call it explicitly. > > Thanks Thanks for having a look! >From code readability sense, seems it's worth, as check of `supports_float16` is already part of check in `vmIntrinsics::is_intrinsic_available` and the names should already implies this. But you're right too, in performance sense, it's not worth to do so. On the other hand, the code is not in performance critical path. How do you think about it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23932#issuecomment-2706013532 From bulasevich at openjdk.org Fri Mar 7 10:02:05 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 7 Mar 2025 10:02:05 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Thu, 6 Mar 2025 13:27:45 GMT, Boris Ulasevich wrote: >> Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: >> >> - swap matadata and jvmci data in outputs according to data layout >> - cleanup >> - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup >> - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description >> - add a separate adrp_movk function to to support targets located more than 4GB away >> - Force the use of movk in combination with adrp and ldr instructions to address scenarios >> where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp >> - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: >> _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. >> Fix: use _oops_size int16 field to calculate metadata offset >> - removing dead code >> - a bit of cleanup and addressing review suggestions >> - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup >> - ... and 5 more: https://git.openjdk.org/jdk/compare/24f41aa8...bc8c590c > >> Please swap `matadata` and `jvmci data` in outputs ... >> >> Also please merge latest JDK which have SA cleanup related to compilers: #23782 > > Yes. Thanks! > @bulasevich is it ready for testing now? @vnkozlov yes, it's ready for testing. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2706024125 From mli at openjdk.org Fri Mar 7 10:53:45 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Mar 2025 10:53:45 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this trivial patch? > Currently, it only reports that some flag is non-whitelisted, but does not print out the flag explicitly, but it's helpful to do so. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23931/files - new: https://git.openjdk.org/jdk/pull/23931/files/e0fe7322..97a68e2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23931&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23931&range=00-01 Stats: 42 lines in 1 file changed: 21 ins; 9 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/23931.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23931/head:pull/23931 PR: https://git.openjdk.org/jdk/pull/23931 From mli at openjdk.org Fri Mar 7 10:53:46 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Mar 2025 10:53:46 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v2] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 21:45:44 GMT, Christian Hagedorn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> refine > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 752: > >> 750: if (!flag.startsWith("-D") && !flag.startsWith("-e") && JTREG_WHITELIST_FLAGS.stream().noneMatch(flag::contains)) { >> 751: // Found VM flag that is not whitelisted >> 752: System.out.println("Non-whitelisted JTreg VM or Javaoptions flag: " + flag); > > That's a good idea! Is the intention to just report the first non-whitelisted flag found or all of them? I guess just one of them is fine to indicate the reason for not performing IR matching (could be verbose to report all non-whitelisted flags otherwise). > > I would merge this message with the already existing one here and remove that one in favor of the new one: > https://github.com/openjdk/jdk/blob/a23fb0af65f491ef655ba114fcc8032a09a55213/test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java#L612-L615 > > Another thought while cleaning this up: We could also improve this message > https://github.com/openjdk/jdk/blob/a23fb0af65f491ef655ba114fcc8032a09a55213/test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java#L597-L602 > and split it into three separate bailouts + messages. Could be part of this RFE (you could then set a new title for the issue to something like `[IR Framework] Improve reported disabled IR verification messages`). Thanks for the suggestion! It make sense to me. Could you have another look? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23931#discussion_r1984854330 From epeter at openjdk.org Fri Mar 7 10:55:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 10:55:34 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning Message-ID: With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: - The loops disappear during IGVN. - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. ------------- Commit messages: - the fix - JDK-8351392 Changes: https://git.openjdk.org/jdk/pull/23943/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23943&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351392 Stats: 32 lines in 3 files changed: 32 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23943.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23943/head:pull/23943 PR: https://git.openjdk.org/jdk/pull/23943 From chagedorn at openjdk.org Fri Mar 7 11:05:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 11:05:30 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly [v3] In-Reply-To: References: Message-ID: <9-xrDx7B0nuNTGRghlUjNqYgYtyOiKxiPuCEE1DCxxc=.3de4dbb4-bd53-44aa-9682-613fa6d1341d@github.com> > This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). > > ### Update Assertion Predicate Killing Mechanism > The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. > > #### Mark Opaque*AssertionPredicate` Nodes Useless > Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: > - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. > - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. > - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. > - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. > > #### Update Predicate Iteration Code > To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: > - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. > > #### Other Updates > I've also applied some small refactorings of touched code. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Update mark_useless() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23941/files - new: https://git.openjdk.org/jdk/pull/23941/files/aa88ded9..1ddc8233 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23941&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23941&range=01-02 Stats: 16 lines in 5 files changed: 5 ins; 6 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23941/head:pull/23941 PR: https://git.openjdk.org/jdk/pull/23941 From chagedorn at openjdk.org Fri Mar 7 11:09:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 11:09:07 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly [v4] In-Reply-To: References: Message-ID: > This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). > > ### Update Assertion Predicate Killing Mechanism > The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. > > #### Mark Opaque*AssertionPredicate` Nodes Useless > Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: > - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. > - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. > - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. > - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. > > #### Update Predicate Iteration Code > To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: > - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. > > #### Other Updates > I've also applied some small refactorings of touched code. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Update mark_useless for OpaqueMultiversioning ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23941/files - new: https://git.openjdk.org/jdk/pull/23941/files/1ddc8233..8208b640 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23941&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23941&range=02-03 Stats: 15 lines in 3 files changed: 7 ins; 6 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23941/head:pull/23941 PR: https://git.openjdk.org/jdk/pull/23941 From chagedorn at openjdk.org Fri Mar 7 11:09:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 11:09:07 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly [v4] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 09:20:11 GMT, Christian Hagedorn wrote: >> Pushing inside of `mark_useless` has the advantage that one cannot forget pushing. > > Forgot to comment on that. With the next PR, I'm updating the elimination of Template Assertion Predicates and thus require to call `mark_useless()` in a first step to mark them all useless. Afterwards, I'm marking those non-useless again that can be found from loops. Thus, we do not want to always add all nodes to the worklist when calling `mark_useless()` if they are not going to be removed. While working and testing the follow-up PR, I found that I could still do it differently than originally planned. I'm thus changing this to match what we have in `OpaqueInitializedAssertionPredicateNode`. I've also updated `OpaqueMultiversioning` accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23941#discussion_r1984879711 From chagedorn at openjdk.org Fri Mar 7 11:28:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 11:28:57 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 10:35:58 GMT, Emanuel Peter wrote: > With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. > > But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: > - The loops disappear during IGVN. > - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. > - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. > - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) > > Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. src/hotspot/share/opto/loopopts.cpp line 794: > 792: return nullptr; > 793: } > 794: if (bol->is_OpaqueMultiversioning() && bol->as_OpaqueMultiversioning()->is_useless()) { Isn't any `OpaqueMultiversioning` we find here supposed to be useless? Can we assert that instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23943#discussion_r1984903770 From epeter at openjdk.org Fri Mar 7 11:39:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 11:39:59 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 11:26:39 GMT, Christian Hagedorn wrote: >> With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. >> >> But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: >> - The loops disappear during IGVN. >> - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. >> - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. >> - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) >> >> Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. > > src/hotspot/share/opto/loopopts.cpp line 794: > >> 792: return nullptr; >> 793: } >> 794: if (bol->is_OpaqueMultiversioning() && bol->as_OpaqueMultiversioning()->is_useless()) { > > Isn't any `OpaqueMultiversioning` we find here supposed to be useless? Can we assert that instead? That is essencially what we I already do. If we find a **useful** node, then we just hit the assert further down. But if you want then I can refactor it explicitly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23943#discussion_r1984916090 From chagedorn at openjdk.org Fri Mar 7 11:42:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 11:42:53 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning In-Reply-To: References: Message-ID: <7QZKsZq_DOEXbw-zAzppWAc-2rQN31fDf8mMPH7BOTE=.8b332698-79d2-4e6b-8ed7-979bf053940b@github.com> On Fri, 7 Mar 2025 11:37:22 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopopts.cpp line 794: >> >>> 792: return nullptr; >>> 793: } >>> 794: if (bol->is_OpaqueMultiversioning() && bol->as_OpaqueMultiversioning()->is_useless()) { >> >> Isn't any `OpaqueMultiversioning` we find here supposed to be useless? Can we assert that instead? > > That is essencially what we I already do. If we find a **useful** node, then we just hit the assert further down. > > But if you want then I can refactor it explicitly. It might be more explicit with a separate assert that an `OpaqueMultiversioning` should be useless. And we can avoid the `is_useless()` check in product - but of course that should not make much of a difference performance wise :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23943#discussion_r1984919501 From epeter at openjdk.org Fri Mar 7 12:20:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 12:20:05 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v2] In-Reply-To: References: Message-ID: <3KvvUSvsuELDH07hemoqlDJNICAU5RfkrysTBOCwXjA=.6d1bf7a5-32d7-476e-8f87-dba083d0d3eb@github.com> > With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. > > But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: > - The loops disappear during IGVN. > - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. > - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. > - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) > > Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23943/files - new: https://git.openjdk.org/jdk/pull/23943/files/caeb3a55..e852e377 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23943&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23943&range=00-01 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23943.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23943/head:pull/23943 PR: https://git.openjdk.org/jdk/pull/23943 From epeter at openjdk.org Fri Mar 7 12:20:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 12:20:06 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v2] In-Reply-To: <7QZKsZq_DOEXbw-zAzppWAc-2rQN31fDf8mMPH7BOTE=.8b332698-79d2-4e6b-8ed7-979bf053940b@github.com> References: <7QZKsZq_DOEXbw-zAzppWAc-2rQN31fDf8mMPH7BOTE=.8b332698-79d2-4e6b-8ed7-979bf053940b@github.com> Message-ID: On Fri, 7 Mar 2025 11:40:14 GMT, Christian Hagedorn wrote: >> That is essencially what we I already do. If we find a **useful** node, then we just hit the assert further down. >> >> But if you want then I can refactor it explicitly. > > It might be more explicit with a separate assert that an `OpaqueMultiversioning` should be useless. And we can avoid the `is_useless()` check in product - but of course that should not make much of a difference performance wise :-) Ok, I refactored it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23943#discussion_r1984963252 From syan at openjdk.org Fri Mar 7 13:02:03 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 7 Mar 2025 13:02:03 GMT Subject: RFR: 8351233: [ASAN] avx2-emu-funcs.hpp:151:20: error: =?UTF-8?B?4oCYRC44MjE4OOKAmQ==?= is used uninitialized In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 03:35:20 GMT, SendaoYan wrote: > Hi all, > > The return type of function `const __m256i &perm` is `__m256i`, so `const __m256i &perm` should be replaced as 'const __m256i perm'. > > The function implementation in gcc/clang compiler header: > > 1. gcc: lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h > > > extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm256_loadu_si256 (__m256i_u const *__P) > { > return *__P; > } > > > 2. clang: lib64/clang/17/include/avxintrin.h > > > static __inline __m256i __DEFAULT_FN_ATTRS > _mm256_loadu_si256(__m256i_u const *__p) > { > struct __loadu_si256 { > __m256i_u __v; > } __attribute__((__packed__, __may_alias__)); > return ((const struct __loadu_si256*)__p)->__v; > } > > > Additional testing: > > - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64(AMD EPYC 9T24 96-Core Processor) with release build > - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64(AMD EPYC 9T24 96-Core Processor) with fastdebug build GHA report 1 test failure: 1. Job 'macos-aarch64 hs/tier1 gc' report "gc/TestAllocHumongousFragment.java#generational" timed out, this issue has been recorded by [JDK-8345958](https://bugs.openjdk.org/browse/JDK-8345958), it's unrelated to this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23925#issuecomment-2706398809 From duke at openjdk.org Fri Mar 7 13:11:39 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 13:11:39 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v6] In-Reply-To: References: Message-ID: > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - use a random number in testing - Merge branch 'master' into fix/missing-transformation-for-chain-of-shifts-multiplications-by-constants - + comment on why not zerocon - comment - Fix style in the few lines I haven't touched yet - Remove useless local, with especially helpful name - rename - Add test suggested by @dean-long exhibiting the difference between (x << 30) << 3 and x << 33 - improve simplification of double shifts in stores - actually return a new node - ... and 6 more: https://git.openjdk.org/jdk/compare/7b1cfa3b...8e93a12f ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/b8c3d74f..8e93a12f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=04-05 Stats: 37758 lines in 1023 files changed: 17356 ins; 15313 del; 5089 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From epeter at openjdk.org Fri Mar 7 13:11:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 13:11:40 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v5] In-Reply-To: References: Message-ID: <0-rhfBkyIO8uh5uioQ4XDoEWGwRdfzah8GJrYvILeDM=.699a73a4-ffd2-42f1-b7df-4c32235b3218@github.com> On Thu, 27 Feb 2025 07:54:34 GMT, Marc Chevalier wrote: >> This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. >> >> Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. >> >> This also works for multiplications by powers of 2 since they are already translated into shifts. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > + comment on why not zerocon @marc-chevalier This is good work, thanks for working on this :) I have a first batch of comments and suggestions. src/hotspot/share/opto/memnode.cpp line 3523: > 3521: // (StoreB ... (valIn) ) > 3522: // If (conIL > conIR) we are inventing 0 lower bits, and throwing > 3523: // away upper bits, but we are not introducing garbage bits by above. What do you mean with `by above`? src/hotspot/share/opto/memnode.cpp line 3527: > 3525: // (StoreB ... (LShiftI _ valIn (conIL - conIR)) ) > 3526: // This case happens when the store source was itself a left shift, that gets merged > 3527: // into the inner left shift of the sign-extension. Hmm, above you were talking about a left and a right shift. But now you seem to be talking about some "source" left shift and an "inner" left shift. It's a bit confusing. Are you talking about this case? `(StoreB ... (RShiftI _ (LShiftI _ (LShiftI _ valIn conIL1 ) conIL2 ) conIR) )` Where the two left shifts are already combined by Ideal earlier? src/hotspot/share/opto/memnode.cpp line 3534: > 3532: // 31 8 7 0 > 3533: // v[0..7] is meaningful, but v[8..31] is not. In this case, num_rejected_bits == 24 > 3534: // If we do the shift left then right by 24 bits, we get: It could be nice if you explicitly denoted the 3 cases, maybe even using some indentation for emphasis: Case 1: conIL == conIR Case 2: conIL > conIR Case 3: conIL < conIR src/hotspot/share/opto/memnode.cpp line 3556: > 3554: // | sign bit | v[0..5] | 0 | > 3555: // +------------------+---------+-----+ > 3556: // 31 8 7 2 1 0 Can you make a statement if this is ok or not? src/hotspot/share/opto/memnode.cpp line 3567: > 3565: // | sign bit | v[0..5] | 0 | > 3566: // +------------------+---------+-----+ > 3567: // 31 10 9 4 3 0 Ah, this is basically a second example of `conIL > conIR`. Can you say if this case is ok? src/hotspot/share/opto/memnode.cpp line 3568: > 3566: // +------------------+---------+-----+ > 3567: // 31 10 9 4 3 0 > 3568: // Do we also have a case where `conIL < conIR`? What happens then? src/hotspot/share/opto/memnode.cpp line 3576: > 3574: Node* val = in(MemNode::ValueIn); > 3575: if (val->Opcode() == Op_RShiftI) { > 3576: const TypeInt* conIR = phase->type(val->in(2))->isa_int(); Suggestion: Node* shr = in(MemNode::ValueIn); if (shr->Opcode() == Op_RShiftI) { const TypeInt* conIR = phase->type(shr->in(2))->isa_int(); Might as well to keep it consistent with `shl` below. src/hotspot/share/opto/memnode.cpp line 3577: > 3575: if (val->Opcode() == Op_RShiftI) { > 3576: const TypeInt* conIR = phase->type(val->in(2))->isa_int(); > 3577: if (conIR != nullptr && conIR->is_con() && (conIR->get_con() <= num_rejected_bits)) { Can you say why you need `conIR->get_con() <= num_rejected_bits` in a comment? src/hotspot/share/opto/mulnode.cpp line 980: > 978: // con0 is the rhs of outer_shift (since it's already computed in the callers) > 979: // con0 is assumed to be masked already (as computed by maskShiftAmount) and non-zero > 980: // bt must be T_LONG or T_INT. Suggestion: // We have: // outer_shift = (_ << con0) // We are looking for the pattern: // outer_shift = (inner_shift << con0) // outer_shift = ((x << con1) << con0) // // if con0 + con1 >= nbits => 0 // if con0 + con1 < nbits => x << (con1 + con0) // // Note: con0 and con1 are both in [0..nbits), as they are computed by maskShiftAmount. `bt must be T_LONG or T_INT.` is already stated by the assert. This is just a suggestion, take/modify/ignore it as you wish ;) src/hotspot/share/opto/mulnode.cpp line 983: > 981: static Node* collapse_nested_shift_left(PhaseGVN* phase, Node* outer_shift, int con0, BasicType bt) { > 982: assert(bt == T_LONG || bt == T_INT, "Unexpected type"); > 983: int nbits = bt == T_LONG ? BitsPerJavaLong : BitsPerJavaInteger; Roland is introducing a new method for this in `https://github.com/openjdk/jdk/pull/23438`, see `bits_per_java_integer`. I suggest you use it too ;) src/hotspot/share/opto/mulnode.cpp line 1136: > 1134: > 1135: // Performs: > 1136: // (X << con1) << con2 ==> X << (con1 + con2) (see implementation for subtleties) Suggestion: // (X << con1) << con2 ==> X << (con1 + con2) I would keep it simple. It is usual that there are more comments in the implementation ;) src/hotspot/share/opto/mulnode.cpp line 1322: > 1320: > 1321: // Performs: > 1322: // (X << con1) << con2 ==> X << (con1 + con2) (see implementation for subtleties) Suggestion: // (X << con1) << con2 ==> X << (con1 + con2) ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23728#pullrequestreview-2667099654 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1984970111 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1984977754 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1984984128 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1984985180 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1984989432 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1984988176 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1984995503 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1984993598 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985023420 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985026156 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985028767 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985032144 From epeter at openjdk.org Fri Mar 7 13:11:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 13:11:40 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v5] In-Reply-To: <0-rhfBkyIO8uh5uioQ4XDoEWGwRdfzah8GJrYvILeDM=.699a73a4-ffd2-42f1-b7df-4c32235b3218@github.com> References: <0-rhfBkyIO8uh5uioQ4XDoEWGwRdfzah8GJrYvILeDM=.699a73a4-ffd2-42f1-b7df-4c32235b3218@github.com> Message-ID: <9mevpop6B5oAf2z2w_wLOzURelZ1mqFFBIJ7Zp_tXNw=.f49b24c6-0e63-4db6-afce-b2c80a3fe797@github.com> On Fri, 7 Mar 2025 12:34:36 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> + comment on why not zerocon > > src/hotspot/share/opto/memnode.cpp line 3534: > >> 3532: // 31 8 7 0 >> 3533: // v[0..7] is meaningful, but v[8..31] is not. In this case, num_rejected_bits == 24 >> 3534: // If we do the shift left then right by 24 bits, we get: > > It could be nice if you explicitly denoted the 3 cases, maybe even using some indentation for emphasis: > > Case 1: conIL == conIR > Case 2: conIL > conIR > Case 3: conIL < conIR It could also be nice to say what you are trying to show / prove here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1984986704 From duke at openjdk.org Fri Mar 7 13:11:40 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 13:11:40 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v5] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 09:17:21 GMT, Damon Fenacci wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> + comment on why not zerocon > > test/hotspot/jtreg/compiler/c2/irTests/LShiftINodeIdealizationTests.java line 178: > >> 176: >> 177: short[] arr = new short[1]; >> 178: arr[0] = (short)1; > > What do you think about using random short value here? (I was just thinking it might slightly increase the chances of spotting if something is wrong with the shifts...) That makes sense. I use the random `a` everywhere else, so why not here, too!? So, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985034602 From epeter at openjdk.org Fri Mar 7 13:15:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 13:15:01 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v6] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 13:11:39 GMT, Marc Chevalier wrote: >> This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. >> >> Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. >> >> This also works for multiplications by powers of 2 since they are already translated into shifts. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - use a random number in testing > - Merge branch 'master' into fix/missing-transformation-for-chain-of-shifts-multiplications-by-constants > - + comment on why not zerocon > - comment > - Fix style in the few lines I haven't touched yet > - Remove useless local, with especially helpful name > - rename > - Add test suggested by @dean-long exhibiting the difference between (x << 30) << 3 and x << 33 > - improve simplification of double shifts in stores > - actually return a new node > - ... and 6 more: https://git.openjdk.org/jdk/compare/fa1b30c2...8e93a12f test/hotspot/jtreg/compiler/c2/irTests/LShiftLNodeIdealizationTests.java line 222: > 220: public long testDoubleShift9(long x) { > 221: return (x << 62L) << 3L; > 222: } I see that you have quite a few examples here with fixed constants. It would be good to extend this with random constants. You can do that with `static final` fields, as they are constant by the time we JIT compile. private static final int CON0 = RANDOM.nextInt(); private static final int CON1 = RANDOM.nextInt(); @Test public long test(int x) { return (x << CON0) << CON1; } I would give you "bonus points" if you use `Generators.java`, because that produces more interesting constants. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985039171 From epeter at openjdk.org Fri Mar 7 13:27:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 13:27:54 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v2] In-Reply-To: <3KvvUSvsuELDH07hemoqlDJNICAU5RfkrysTBOCwXjA=.6d1bf7a5-32d7-476e-8f87-dba083d0d3eb@github.com> References: <3KvvUSvsuELDH07hemoqlDJNICAU5RfkrysTBOCwXjA=.6d1bf7a5-32d7-476e-8f87-dba083d0d3eb@github.com> Message-ID: On Fri, 7 Mar 2025 12:20:05 GMT, Emanuel Peter wrote: >> With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. >> >> But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: >> - The loops disappear during IGVN. >> - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. >> - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. >> - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) >> >> Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > for Christian @chhagedorn thanks for the review! I addressed your comment :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23943#issuecomment-2706451150 From duke at openjdk.org Fri Mar 7 13:52:01 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 13:52:01 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v6] In-Reply-To: References: Message-ID: <1qH9oxjV9Eb-dIRfROfHvsWs7K7_4pkFwDD19uJKQvA=.453734c9-897d-4938-83f8-0e5c71266f1c@github.com> On Fri, 7 Mar 2025 13:12:18 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - use a random number in testing >> - Merge branch 'master' into fix/missing-transformation-for-chain-of-shifts-multiplications-by-constants >> - + comment on why not zerocon >> - comment >> - Fix style in the few lines I haven't touched yet >> - Remove useless local, with especially helpful name >> - rename >> - Add test suggested by @dean-long exhibiting the difference between (x << 30) << 3 and x << 33 >> - improve simplification of double shifts in stores >> - actually return a new node >> - ... and 6 more: https://git.openjdk.org/jdk/compare/24483157...8e93a12f > > test/hotspot/jtreg/compiler/c2/irTests/LShiftLNodeIdealizationTests.java line 222: > >> 220: public long testDoubleShift9(long x) { >> 221: return (x << 62L) << 3L; >> 222: } > > I see that you have quite a few examples here with fixed constants. It would be good to extend this with random constants. You can do that with `static final` fields, as they are constant by the time we JIT compile. > > > private static final int CON0 = RANDOM.nextInt(); > private static final int CON1 = RANDOM.nextInt(); > > @Test > public long test(int x) { > return (x << CON0) << CON1; > } > > I would give you "bonus points" if you use `Generators.java`, because that produces more interesting constants. I can also do that, but I think using the constants is more valuable than only random numbers: most of random int or long won't have some corner properties I try to cover here. On top of that, with the comments, they help enumerating clearly different cases to look at. So, I'd be happy to add some randomization, but I strongly feel I should keep the constant ones as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985092705 From epeter at openjdk.org Fri Mar 7 13:54:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 13:54:59 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v6] In-Reply-To: <1qH9oxjV9Eb-dIRfROfHvsWs7K7_4pkFwDD19uJKQvA=.453734c9-897d-4938-83f8-0e5c71266f1c@github.com> References: <1qH9oxjV9Eb-dIRfROfHvsWs7K7_4pkFwDD19uJKQvA=.453734c9-897d-4938-83f8-0e5c71266f1c@github.com> Message-ID: On Fri, 7 Mar 2025 13:49:05 GMT, Marc Chevalier wrote: >> test/hotspot/jtreg/compiler/c2/irTests/LShiftLNodeIdealizationTests.java line 222: >> >>> 220: public long testDoubleShift9(long x) { >>> 221: return (x << 62L) << 3L; >>> 222: } >> >> I see that you have quite a few examples here with fixed constants. It would be good to extend this with random constants. You can do that with `static final` fields, as they are constant by the time we JIT compile. >> >> >> private static final int CON0 = RANDOM.nextInt(); >> private static final int CON1 = RANDOM.nextInt(); >> >> @Test >> public long test(int x) { >> return (x << CON0) << CON1; >> } >> >> I would give you "bonus points" if you use `Generators.java`, because that produces more interesting constants. > > I can also do that, but I think using the constants is more valuable than only random numbers: most of random int or long won't have some corner properties I try to cover here. On top of that, with the comments, they help enumerating clearly different cases to look at. So, I'd be happy to add some randomization, but I strongly feel I should keep the constant ones as well. Absolutely, please keep the cases you already have, and add some randomized cases on top of it. > most of random int or long won't have some corner properties I try to cover here Right, that's why it is better to use `Generators.java` rather than just `Random`. With `Generators`, we make sure to generate more special values, such as powers of 2. And powers of 2 multiplication can for example be converted to shift, and that seems to be quite relevant here ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985097087 From duke at openjdk.org Fri Mar 7 14:14:39 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 14:14:39 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v7] In-Reply-To: References: Message-ID: > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Rework on the comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/8e93a12f..14ee25d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=05-06 Stats: 73 lines in 2 files changed: 55 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From duke at openjdk.org Fri Mar 7 14:14:40 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 14:14:40 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v5] In-Reply-To: <0-rhfBkyIO8uh5uioQ4XDoEWGwRdfzah8GJrYvILeDM=.699a73a4-ffd2-42f1-b7df-4c32235b3218@github.com> References: <0-rhfBkyIO8uh5uioQ4XDoEWGwRdfzah8GJrYvILeDM=.699a73a4-ffd2-42f1-b7df-4c32235b3218@github.com> Message-ID: On Fri, 7 Mar 2025 12:22:50 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> + comment on why not zerocon > > src/hotspot/share/opto/memnode.cpp line 3523: > >> 3521: // (StoreB ... (valIn) ) >> 3522: // If (conIL > conIR) we are inventing 0 lower bits, and throwing >> 3523: // away upper bits, but we are not introducing garbage bits by above. > > What do you mean with `by above`? I mean from the upper bits, from the left if written like decimal numbers. Rephrasing that. > src/hotspot/share/opto/memnode.cpp line 3527: > >> 3525: // (StoreB ... (LShiftI _ valIn (conIL - conIR)) ) >> 3526: // This case happens when the store source was itself a left shift, that gets merged >> 3527: // into the inner left shift of the sign-extension. > > Hmm, above you were talking about a left and a right shift. But now you seem to be talking about some "source" left shift and an "inner" left shift. It's a bit confusing. Are you talking about this case? > `(StoreB ... (RShiftI _ (LShiftI _ (LShiftI _ valIn conIL1 ) conIL2 ) conIR) )` > Where the two left shifts are already combined by Ideal earlier? Yes indeed. You seem confused about the term "source". I'm happy to change it, but I can't see a better word: a store (or move, or assignment) has a source and a destination, that are often resp. the left-hand side and the right-hand side of the expression/statement/instruction. I'm rephrasing that. > src/hotspot/share/opto/memnode.cpp line 3576: > >> 3574: Node* val = in(MemNode::ValueIn); >> 3575: if (val->Opcode() == Op_RShiftI) { >> 3576: const TypeInt* conIR = phase->type(val->in(2))->isa_int(); > > Suggestion: > > Node* shr = in(MemNode::ValueIn); > if (shr->Opcode() == Op_RShiftI) { > const TypeInt* conIR = phase->type(shr->in(2))->isa_int(); > > Might as well to keep it consistent with `shl` below. Better indeed. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985124253 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985124188 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985124661 From duke at openjdk.org Fri Mar 7 14:22:29 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 14:22:29 GMT Subject: RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code Message-ID: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. tl;dr: - C1: no problem, no change - C2: - with intrinsics: - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) - without overflow: no problem, no change - without intrinsics: no problem, no change Before the fix: Benchmark (SIZE) Mode Cnt Score Error Units MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.590 ms/op MathExact.C1_1.loopNegateLOverflow 1000000 avgt 3 638.837 ? 49.512 ms/op MathExact.C1_1.loopSubtractIInBounds 1000000 avgt 3 1.255 ? 0.799 ms/op MathExact.C1_1.loopSubtractIOverflow 1000000 avgt 3 637.857 ? 231.804 ms/op MathExact.C1_1.loopSubtractLInBounds 1000000 avgt 3 1.412 ? 0.602 ms/op MathExact.C1_1.loopSubtractLOverflow 1000000 avgt 3 642.113 ? 251.349 ms/op MathExact.C1_2.loopAddIInBounds 1000000 avgt 3 1.748 ? 1.095 ms/op MathExact.C1_2.loopAddIOverflow 1000000 avgt 3 654.617 ? 287.678 ms/op MathExact.C1_2.loopAddLInBounds 1000000 avgt 3 2.004 ? 1.655 ms/op MathExact.C1_2.loopAddLOverflow 1000000 avgt 3 670.791 ? 93.689 ms/op MathExact.C1_2.loopDecrementIInBounds 1000000 avgt 3 5.306 ? 65.215 ms/op MathExact.C1_2.loopDecrementIOverflow 1000000 avgt 3 650.425 ? 461.740 ms/op MathExact.C1_2.loopDecrementLInBounds 1000000 avgt 3 5.484 ? 42.778 ms/op MathExact.C1_2.loopDecrementLOverflow 1000000 avgt 3 656.747 ? 333.281 ms/op MathExact.C1_2.loopIncrementIInBounds 1000000 avgt 3 3.077 ? 1.677 ms/op MathExact.C1_2.loopIncrementIOverflow 1000000 avgt 3 634.510 ? 51.365 ms/op MathExact.C1_2.loopIncrementLInBounds 1000000 avgt 3 3.902 ? 18.471 ms/op MathExact.C1_2.loopIncrementLOverflow 1000000 avgt 3 656.465 ? 227.014 ms/op MathExact.C1_2.loopMultiplyIInBounds 1000000 avgt 3 2.384 ? 10.045 ms/op MathExact.C1_2.loopMultiplyIOverflow 1000000 avgt 3 624.029 ? 342.084 ms/op MathExact.C1_2.loopMultiplyLInBounds 1000000 avgt 3 3.247 ? 0.735 ms/op MathExact.C1_2.loopMultiplyLOverflow 1000000 avgt 3 661.427 ? 100.744 ms/op MathExact.C1_2.loopNegateIInBounds 1000000 avgt 3 3.061 ? 1.148 ms/op MathExact.C1_2.loopNegateIOverflow 1000000 avgt 3 645.241 ? 323.824 ms/op MathExact.C1_2.loopNegateLInBounds 1000000 avgt 3 3.211 ? 0.068 ms/op MathExact.C1_2.loopNegateLOverflow 1000000 avgt 3 658.846 ? 204.524 ms/op MathExact.C1_2.loopSubtractIInBounds 1000000 avgt 3 1.717 ? 0.161 ms/op MathExact.C1_2.loopSubtractIOverflow 1000000 avgt 3 644.287 ? 301.787 ms/op MathExact.C1_2.loopSubtractLInBounds 1000000 avgt 3 3.976 ? 11.982 ms/op MathExact.C1_2.loopSubtractLOverflow 1000000 avgt 3 660.871 ? 16.538 ms/op MathExact.C1_3.loopAddIInBounds 1000000 avgt 3 4.380 ? 42.598 ms/op MathExact.C1_3.loopAddIOverflow 1000000 avgt 3 686.766 ? 511.146 ms/op MathExact.C1_3.loopAddLInBounds 1000000 avgt 3 5.445 ? 49.738 ms/op MathExact.C1_3.loopAddLOverflow 1000000 avgt 3 641.936 ? 32.769 ms/op MathExact.C1_3.loopDecrementIInBounds 1000000 avgt 3 8.340 ? 69.455 ms/op MathExact.C1_3.loopDecrementIOverflow 1000000 avgt 3 682.239 ? 212.017 ms/op MathExact.C1_3.loopDecrementLInBounds 1000000 avgt 3 6.048 ? 0.651 ms/op MathExact.C1_3.loopDecrementLOverflow 1000000 avgt 3 670.924 ? 42.037 ms/op MathExact.C1_3.loopIncrementIInBounds 1000000 avgt 3 7.970 ? 63.664 ms/op MathExact.C1_3.loopIncrementIOverflow 1000000 avgt 3 684.490 ? 197.407 ms/op MathExact.C1_3.loopIncrementLInBounds 1000000 avgt 3 8.780 ? 86.737 ms/op MathExact.C1_3.loopIncrementLOverflow 1000000 avgt 3 660.941 ? 172.305 ms/op MathExact.C1_3.loopMultiplyIInBounds 1000000 avgt 3 3.241 ? 0.567 ms/op MathExact.C1_3.loopMultiplyIOverflow 1000000 avgt 3 630.455 ? 138.458 ms/op MathExact.C1_3.loopMultiplyLInBounds 1000000 avgt 3 5.906 ? 0.662 ms/op MathExact.C1_3.loopMultiplyLOverflow 1000000 avgt 3 693.248 ? 539.146 ms/op MathExact.C1_3.loopNegateIInBounds 1000000 avgt 3 6.394 ? 7.757 ms/op MathExact.C1_3.loopNegateIOverflow 1000000 avgt 3 644.722 ? 56.929 ms/op MathExact.C1_3.loopNegateLInBounds 1000000 avgt 3 7.610 ? 41.533 ms/op MathExact.C1_3.loopNegateLOverflow 1000000 avgt 3 670.166 ? 14.496 ms/op MathExact.C1_3.loopSubtractIInBounds 1000000 avgt 3 3.345 ? 1.977 ms/op MathExact.C1_3.loopSubtractIOverflow 1000000 avgt 3 677.317 ? 22.878 ms/op MathExact.C1_3.loopSubtractLInBounds 1000000 avgt 3 3.226 ? 0.122 ms/op MathExact.C1_3.loopSubtractLOverflow 1000000 avgt 3 643.642 ? 65.217 ms/op MathExact.C2.loopAddIInBounds 1000000 avgt 3 1.217 ? 1.694 ms/op MathExact.C2.loopAddIOverflow 1000000 avgt 3 3995.424 ? 1177.165 ms/op MathExact.C2.loopAddLInBounds 1000000 avgt 3 2.404 ? 0.053 ms/op MathExact.C2.loopAddLOverflow 1000000 avgt 3 3997.984 ? 612.558 ms/op MathExact.C2.loopDecrementIInBounds 1000000 avgt 3 2.014 ? 0.176 ms/op MathExact.C2.loopDecrementIOverflow 1000000 avgt 3 3828.615 ? 260.670 ms/op MathExact.C2.loopDecrementLInBounds 1000000 avgt 3 1.986 ? 1.536 ms/op MathExact.C2.loopDecrementLOverflow 1000000 avgt 3 4075.934 ? 263.798 ms/op MathExact.C2.loopIncrementIInBounds 1000000 avgt 3 2.238 ? 6.380 ms/op MathExact.C2.loopIncrementIOverflow 1000000 avgt 3 3927.929 ? 837.162 ms/op MathExact.C2.loopIncrementLInBounds 1000000 avgt 3 1.971 ? 1.232 ms/op MathExact.C2.loopIncrementLOverflow 1000000 avgt 3 3915.202 ? 1024.956 ms/op MathExact.C2.loopMultiplyIInBounds 1000000 avgt 3 1.175 ? 0.509 ms/op MathExact.C2.loopMultiplyIOverflow 1000000 avgt 3 3803.719 ? 1583.828 ms/op MathExact.C2.loopMultiplyLInBounds 1000000 avgt 3 0.937 ? 0.631 ms/op MathExact.C2.loopMultiplyLOverflow 1000000 avgt 3 4023.742 ? 967.498 ms/op MathExact.C2.loopNegateIInBounds 1000000 avgt 3 2.129 ? 1.094 ms/op MathExact.C2.loopNegateIOverflow 1000000 avgt 3 3850.484 ? 464.979 ms/op MathExact.C2.loopNegateLInBounds 1000000 avgt 3 2.247 ? 9.714 ms/op MathExact.C2.loopNegateLOverflow 1000000 avgt 3 3911.853 ? 362.961 ms/op MathExact.C2.loopSubtractIInBounds 1000000 avgt 3 1.141 ? 1.579 ms/op MathExact.C2.loopSubtractIOverflow 1000000 avgt 3 3917.533 ? 628.485 ms/op MathExact.C2.loopSubtractLInBounds 1000000 avgt 3 2.232 ? 22.329 ms/op MathExact.C2.loopSubtractLOverflow 1000000 avgt 3 3995.088 ? 302.549 ms/op MathExact.C2_no_intrinsics.loopAddIInBounds 1000000 avgt 3 1.488 ? 12.243 ms/op MathExact.C2_no_intrinsics.loopAddIOverflow 1000000 avgt 3 585.568 ? 106.360 ms/op MathExact.C2_no_intrinsics.loopAddLInBounds 1000000 avgt 3 2.234 ? 23.010 ms/op MathExact.C2_no_intrinsics.loopAddLOverflow 1000000 avgt 3 602.290 ? 212.146 ms/op MathExact.C2_no_intrinsics.loopDecrementIInBounds 1000000 avgt 3 4.705 ? 36.814 ms/op MathExact.C2_no_intrinsics.loopDecrementIOverflow 1000000 avgt 3 590.212 ? 280.334 ms/op MathExact.C2_no_intrinsics.loopDecrementLInBounds 1000000 avgt 3 2.374 ? 13.667 ms/op MathExact.C2_no_intrinsics.loopDecrementLOverflow 1000000 avgt 3 583.053 ? 50.535 ms/op MathExact.C2_no_intrinsics.loopIncrementIInBounds 1000000 avgt 3 3.966 ? 15.366 ms/op MathExact.C2_no_intrinsics.loopIncrementIOverflow 1000000 avgt 3 591.683 ? 171.580 ms/op MathExact.C2_no_intrinsics.loopIncrementLInBounds 1000000 avgt 3 3.682 ? 23.147 ms/op MathExact.C2_no_intrinsics.loopIncrementLOverflow 1000000 avgt 3 601.325 ? 10.597 ms/op MathExact.C2_no_intrinsics.loopMultiplyIInBounds 1000000 avgt 3 1.307 ? 0.235 ms/op MathExact.C2_no_intrinsics.loopMultiplyIOverflow 1000000 avgt 3 570.615 ? 50.808 ms/op MathExact.C2_no_intrinsics.loopMultiplyLInBounds 1000000 avgt 3 1.087 ? 0.486 ms/op MathExact.C2_no_intrinsics.loopMultiplyLOverflow 1000000 avgt 3 595.713 ? 162.773 ms/op MathExact.C2_no_intrinsics.loopNegateIInBounds 1000000 avgt 3 1.874 ? 0.954 ms/op MathExact.C2_no_intrinsics.loopNegateIOverflow 1000000 avgt 3 596.588 ? 68.081 ms/op MathExact.C2_no_intrinsics.loopNegateLInBounds 1000000 avgt 3 2.337 ? 12.164 ms/op MathExact.C2_no_intrinsics.loopNegateLOverflow 1000000 avgt 3 573.711 ? 63.243 ms/op MathExact.C2_no_intrinsics.loopSubtractIInBounds 1000000 avgt 3 1.085 ? 0.815 ms/op MathExact.C2_no_intrinsics.loopSubtractIOverflow 1000000 avgt 3 579.489 ? 61.399 ms/op MathExact.C2_no_intrinsics.loopSubtractLInBounds 1000000 avgt 3 1.020 ? 0.161 ms/op MathExact.C2_no_intrinsics.loopSubtractLOverflow 1000000 avgt 3 580.578 ? 167.454 ms/op After: Benchmark (SIZE) Mode Cnt Score Error Units MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.369 ? 0.462 ms/op MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 635.020 ? 106.156 ms/op MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.371 ? 0.020 ms/op MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 633.864 ? 72.176 ms/op MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 2.053 ? 0.330 ms/op MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 634.675 ? 79.427 ms/op MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 3.798 ? 38.502 ms/op MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 650.880 ? 123.220 ms/op MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 2.305 ? 4.829 ms/op MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 648.231 ? 39.012 ms/op MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.627 ? 3.129 ms/op MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 663.671 ? 446.140 ms/op MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.479 ? 0.102 ms/op MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 627.959 ? 297.291 ms/op MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.718 ? 0.806 ms/op MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.310 ? 112.686 ms/op MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.079 ? 2.166 ms/op MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 640.530 ? 152.489 ms/op MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 3.168 ? 16.524 ms/op MathExact.C1_1.loopNegateLOverflow 1000000 avgt 3 650.823 ? 58.420 ms/op MathExact.C1_1.loopSubtractIInBounds 1000000 avgt 3 2.325 ? 27.865 ms/op MathExact.C1_1.loopSubtractIOverflow 1000000 avgt 3 632.198 ? 280.799 ms/op MathExact.C1_1.loopSubtractLInBounds 1000000 avgt 3 1.478 ? 0.281 ms/op MathExact.C1_1.loopSubtractLOverflow 1000000 avgt 3 626.481 ? 47.028 ms/op MathExact.C1_2.loopAddIInBounds 1000000 avgt 3 1.850 ? 0.462 ms/op MathExact.C1_2.loopAddIOverflow 1000000 avgt 3 640.668 ? 217.610 ms/op MathExact.C1_2.loopAddLInBounds 1000000 avgt 3 1.823 ? 0.123 ms/op MathExact.C1_2.loopAddLOverflow 1000000 avgt 3 643.123 ? 174.505 ms/op MathExact.C1_2.loopDecrementIInBounds 1000000 avgt 3 6.435 ? 54.316 ms/op MathExact.C1_2.loopDecrementIOverflow 1000000 avgt 3 649.622 ? 15.314 ms/op MathExact.C1_2.loopDecrementLInBounds 1000000 avgt 3 4.315 ? 26.421 ms/op MathExact.C1_2.loopDecrementLOverflow 1000000 avgt 3 649.018 ? 386.320 ms/op MathExact.C1_2.loopIncrementIInBounds 1000000 avgt 3 3.444 ? 1.375 ms/op MathExact.C1_2.loopIncrementIOverflow 1000000 avgt 3 628.711 ? 51.292 ms/op MathExact.C1_2.loopIncrementLInBounds 1000000 avgt 3 3.351 ? 0.483 ms/op MathExact.C1_2.loopIncrementLOverflow 1000000 avgt 3 653.560 ? 160.718 ms/op MathExact.C1_2.loopMultiplyIInBounds 1000000 avgt 3 1.860 ? 0.633 ms/op MathExact.C1_2.loopMultiplyIOverflow 1000000 avgt 3 620.883 ? 54.516 ms/op MathExact.C1_2.loopMultiplyLInBounds 1000000 avgt 3 3.998 ? 16.269 ms/op MathExact.C1_2.loopMultiplyLOverflow 1000000 avgt 3 671.956 ? 93.092 ms/op MathExact.C1_2.loopNegateIInBounds 1000000 avgt 3 4.415 ? 44.105 ms/op MathExact.C1_2.loopNegateIOverflow 1000000 avgt 3 661.902 ? 224.843 ms/op MathExact.C1_2.loopNegateLInBounds 1000000 avgt 3 3.492 ? 0.738 ms/op MathExact.C1_2.loopNegateLOverflow 1000000 avgt 3 634.946 ? 150.491 ms/op MathExact.C1_2.loopSubtractIInBounds 1000000 avgt 3 1.712 ? 0.066 ms/op MathExact.C1_2.loopSubtractIOverflow 1000000 avgt 3 651.508 ? 76.022 ms/op MathExact.C1_2.loopSubtractLInBounds 1000000 avgt 3 1.949 ? 0.201 ms/op MathExact.C1_2.loopSubtractLOverflow 1000000 avgt 3 627.459 ? 26.817 ms/op MathExact.C1_3.loopAddIInBounds 1000000 avgt 3 7.378 ? 4.301 ms/op MathExact.C1_3.loopAddIOverflow 1000000 avgt 3 647.275 ? 177.062 ms/op MathExact.C1_3.loopAddLInBounds 1000000 avgt 3 3.427 ? 0.037 ms/op MathExact.C1_3.loopAddLOverflow 1000000 avgt 3 643.735 ? 227.934 ms/op MathExact.C1_3.loopDecrementIInBounds 1000000 avgt 3 5.680 ? 0.497 ms/op MathExact.C1_3.loopDecrementIOverflow 1000000 avgt 3 666.431 ? 8.006 ms/op MathExact.C1_3.loopDecrementLInBounds 1000000 avgt 3 6.897 ? 24.615 ms/op MathExact.C1_3.loopDecrementLOverflow 1000000 avgt 3 683.691 ? 52.892 ms/op MathExact.C1_3.loopIncrementIInBounds 1000000 avgt 3 5.743 ? 0.602 ms/op MathExact.C1_3.loopIncrementIOverflow 1000000 avgt 3 670.027 ? 175.208 ms/op MathExact.C1_3.loopIncrementLInBounds 1000000 avgt 3 6.157 ? 2.876 ms/op MathExact.C1_3.loopIncrementLOverflow 1000000 avgt 3 673.410 ? 245.939 ms/op MathExact.C1_3.loopMultiplyIInBounds 1000000 avgt 3 3.220 ? 0.165 ms/op MathExact.C1_3.loopMultiplyIOverflow 1000000 avgt 3 640.165 ? 505.006 ms/op MathExact.C1_3.loopMultiplyLInBounds 1000000 avgt 3 7.986 ? 62.547 ms/op MathExact.C1_3.loopMultiplyLOverflow 1000000 avgt 3 681.282 ? 107.856 ms/op MathExact.C1_3.loopNegateIInBounds 1000000 avgt 3 7.133 ? 18.111 ms/op MathExact.C1_3.loopNegateIOverflow 1000000 avgt 3 680.976 ? 285.486 ms/op MathExact.C1_3.loopNegateLInBounds 1000000 avgt 3 7.405 ? 37.040 ms/op MathExact.C1_3.loopNegateLOverflow 1000000 avgt 3 681.574 ? 173.484 ms/op MathExact.C1_3.loopSubtractIInBounds 1000000 avgt 3 3.971 ? 16.942 ms/op MathExact.C1_3.loopSubtractIOverflow 1000000 avgt 3 655.780 ? 230.793 ms/op MathExact.C1_3.loopSubtractLInBounds 1000000 avgt 3 3.369 ? 3.844 ms/op MathExact.C1_3.loopSubtractLOverflow 1000000 avgt 3 634.824 ? 20.350 ms/op MathExact.C2.loopAddIInBounds 1000000 avgt 3 2.461 ? 2.936 ms/op MathExact.C2.loopAddIOverflow 1000000 avgt 3 589.095 ? 151.126 ms/op MathExact.C2.loopAddLInBounds 1000000 avgt 3 0.978 ? 0.604 ms/op MathExact.C2.loopAddLOverflow 1000000 avgt 3 590.511 ? 64.618 ms/op MathExact.C2.loopDecrementIInBounds 1000000 avgt 3 1.981 ? 0.443 ms/op MathExact.C2.loopDecrementIOverflow 1000000 avgt 3 593.578 ? 32.752 ms/op MathExact.C2.loopDecrementLInBounds 1000000 avgt 3 2.924 ? 29.455 ms/op MathExact.C2.loopDecrementLOverflow 1000000 avgt 3 601.392 ? 936.568 ms/op MathExact.C2.loopIncrementIInBounds 1000000 avgt 3 2.697 ? 22.142 ms/op MathExact.C2.loopIncrementIOverflow 1000000 avgt 3 602.418 ? 199.763 ms/op MathExact.C2.loopIncrementLInBounds 1000000 avgt 3 1.954 ? 0.396 ms/op MathExact.C2.loopIncrementLOverflow 1000000 avgt 3 601.183 ? 156.439 ms/op MathExact.C2.loopMultiplyIInBounds 1000000 avgt 3 1.530 ? 7.954 ms/op MathExact.C2.loopMultiplyIOverflow 1000000 avgt 3 566.677 ? 45.992 ms/op MathExact.C2.loopMultiplyLInBounds 1000000 avgt 3 2.184 ? 22.242 ms/op MathExact.C2.loopMultiplyLOverflow 1000000 avgt 3 600.233 ? 234.648 ms/op MathExact.C2.loopNegateIInBounds 1000000 avgt 3 2.130 ? 1.028 ms/op MathExact.C2.loopNegateIOverflow 1000000 avgt 3 593.145 ? 337.886 ms/op MathExact.C2.loopNegateLInBounds 1000000 avgt 3 2.600 ? 20.795 ms/op MathExact.C2.loopNegateLOverflow 1000000 avgt 3 592.288 ? 138.321 ms/op MathExact.C2.loopSubtractIInBounds 1000000 avgt 3 1.081 ? 0.265 ms/op MathExact.C2.loopSubtractIOverflow 1000000 avgt 3 575.884 ? 200.113 ms/op MathExact.C2.loopSubtractLInBounds 1000000 avgt 3 1.016 ? 0.792 ms/op MathExact.C2.loopSubtractLOverflow 1000000 avgt 3 589.873 ? 52.521 ms/op MathExact.C2_no_intrinsics.loopAddIInBounds 1000000 avgt 3 2.166 ? 10.999 ms/op MathExact.C2_no_intrinsics.loopAddIOverflow 1000000 avgt 3 586.660 ? 229.451 ms/op MathExact.C2_no_intrinsics.loopAddLInBounds 1000000 avgt 3 1.054 ? 0.528 ms/op MathExact.C2_no_intrinsics.loopAddLOverflow 1000000 avgt 3 572.511 ? 76.440 ms/op MathExact.C2_no_intrinsics.loopDecrementIInBounds 1000000 avgt 3 1.907 ? 0.149 ms/op MathExact.C2_no_intrinsics.loopDecrementIOverflow 1000000 avgt 3 599.262 ? 600.992 ms/op MathExact.C2_no_intrinsics.loopDecrementLInBounds 1000000 avgt 3 1.820 ? 0.106 ms/op MathExact.C2_no_intrinsics.loopDecrementLOverflow 1000000 avgt 3 570.464 ? 44.418 ms/op MathExact.C2_no_intrinsics.loopIncrementIInBounds 1000000 avgt 3 1.914 ? 0.131 ms/op MathExact.C2_no_intrinsics.loopIncrementIOverflow 1000000 avgt 3 575.143 ? 160.185 ms/op MathExact.C2_no_intrinsics.loopIncrementLInBounds 1000000 avgt 3 1.818 ? 0.288 ms/op MathExact.C2_no_intrinsics.loopIncrementLOverflow 1000000 avgt 3 589.998 ? 33.029 ms/op MathExact.C2_no_intrinsics.loopMultiplyIInBounds 1000000 avgt 3 1.960 ? 10.135 ms/op MathExact.C2_no_intrinsics.loopMultiplyIOverflow 1000000 avgt 3 571.497 ? 264.484 ms/op MathExact.C2_no_intrinsics.loopMultiplyLInBounds 1000000 avgt 3 1.061 ? 0.198 ms/op MathExact.C2_no_intrinsics.loopMultiplyLOverflow 1000000 avgt 3 585.139 ? 317.175 ms/op MathExact.C2_no_intrinsics.loopNegateIInBounds 1000000 avgt 3 2.611 ? 22.325 ms/op MathExact.C2_no_intrinsics.loopNegateIOverflow 1000000 avgt 3 579.911 ? 140.426 ms/op MathExact.C2_no_intrinsics.loopNegateLInBounds 1000000 avgt 3 2.233 ? 2.774 ms/op MathExact.C2_no_intrinsics.loopNegateLOverflow 1000000 avgt 3 572.368 ? 81.851 ms/op MathExact.C2_no_intrinsics.loopSubtractIInBounds 1000000 avgt 3 3.162 ? 38.115 ms/op MathExact.C2_no_intrinsics.loopSubtractIOverflow 1000000 avgt 3 582.794 ? 65.622 ms/op MathExact.C2_no_intrinsics.loopSubtractLInBounds 1000000 avgt 3 1.028 ? 0.255 ms/op MathExact.C2_no_intrinsics.loopSubtractLOverflow 1000000 avgt 3 577.491 ? 69.778 ms/op Is it worth having intrinsics at all? @eme64 wondered, so I tried with this code: public class Test { final static int N = 500_000_000; public static int test(int i) { try{ return Math.multiplyExact(i, i); } catch (Throwable e){ return 0; } } public static void loop() { for(int i = 0; i < N; i++) { test(i % 32_768); } } public static void main(String[] args) { loop(); } } and with much more runs (50 instead of 3), and in a more stable load for the rest of the system. No intrinsic (inlined Java implem): Benchmark 1: ~/jdk/build/linux-x64/jdk/bin/java -XX:CompileCommand=compileonly,"Test*::test*" -XX:-UseOnStackReplacement Test.java Time (mean ? ?): 8.651 s ? 0.902 s [User: 8.517 s, System: 0.155 s] Range (min ? max): 6.853 s ? 10.439 s 50 runs Always intrinsic (current behavior, and new behavior in absence of overflow, like in this example): Benchmark 1: ~/jdk/build/linux-x64/jdk/bin/java -XX:CompileCommand=compileonly,"Test*::test*" -XX:-UseOnStackReplacement Test.java Time (mean ? ?): 8.222 s ? 1.024 s [User: 8.090 s, System: 0.155 s] Range (min ? max): 6.667 s ? 10.406 s 50 runs So it's... not very conclusive, but likely to be a bit useful. The gap between the means is about 0.4s, which is less than half the standard deviation. Still, it seems good to have. >From a more theoretical point of view, we can see that the code generated for the instrinsics is mostly a `mul` and a `jo`, while it is much more complicated for inlined java (with many `mov`, `movsx`, `cmp` and conditional jumps, looking a lot like the Java code). Thanks, Marc ------------- Commit messages: - More exhaustive bench - Limit inlining of math Exact operations in case of too many deopts Changes: https://git.openjdk.org/jdk/pull/23916/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346989 Stats: 405 lines in 2 files changed: 404 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23916/head:pull/23916 PR: https://git.openjdk.org/jdk/pull/23916 From epeter at openjdk.org Fri Mar 7 14:22:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 14:22:30 GMT Subject: RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 5 Mar 2025 12:56:48 GMT, Marc Chevalier wrote: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... The benchmark generally looks good to me, I only have some minor suggestions ;) Ah. And is this only about `multiplyExact`, or are there other methods affected? Would be nice to extend the benchmark to those as well. And yet another idea: you could probably write an IR test that checks that we at first have the compilation with the trap, and another test where we trap too much and then get a different compilation (without the intrinsic?). Plus: the issue title is very generic. I think it should mention something about `Math.*Exact` as well ;) test/micro/org/openjdk/bench/vm/compiler/MultiplyExact.java line 47: > 45: try { > 46: return square(i); > 47: } catch (Throwable e) { Can you catch a more specific exception? Catching very general exceptions can often mask other bugs. I suppose this is only a benchmark, but it would still be good practice ;) test/micro/org/openjdk/bench/vm/compiler/MultiplyExact.java line 62: > 60: > 61: @Fork(value = 1) > 62: public static class C2 extends MultiplyExact {} What about a C2 version where you just disable the intrinsic? ------------- PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2663529726 PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2703023122 PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r1982809388 PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r1982808076 From epeter at openjdk.org Fri Mar 7 14:22:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 14:22:30 GMT Subject: RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Thu, 6 Mar 2025 07:16:40 GMT, Emanuel Peter wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > The benchmark generally looks good to me, I only have some minor suggestions ;) > Is it worth inlining at all? @eme64 wondered, so I tried with this code: You ask this in the PR description. I think I was not thinking about `inlining` but rather using the `intrinsic`. How much speedup does the intrinsic really deliver? Is it really better than pure Java? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2703015476 From duke at openjdk.org Fri Mar 7 14:22:30 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 14:22:30 GMT Subject: RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Thu, 6 Mar 2025 07:19:48 GMT, Emanuel Peter wrote: > You ask this in the PR description. I think I was not thinking about inlining but rather using the intrinsic. How much speedup does the intrinsic really deliver? Is it really better than pure Java? My fault. I used "inline" instead of "intrinsic" because the functions implementing the intrinsic are called `inline_math_mathExact` and alike. So, I compared the intrinsic vs. the pure java implementation, that happens to be inlined. And intrinsic is a bit better. I'll edit the text to fix that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2703823132 From duke at openjdk.org Fri Mar 7 14:22:30 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 14:22:30 GMT Subject: RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: <7npMvWN2HNTIZOpeIVuhrZM9i5YiZEDvJC6xlReut_4=.e8a98a0b-7146-44a7-94e1-0d4a27566f1f@github.com> On Thu, 6 Mar 2025 07:11:40 GMT, Emanuel Peter wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > test/micro/org/openjdk/bench/vm/compiler/MultiplyExact.java line 47: > >> 45: try { >> 46: return square(i); >> 47: } catch (Throwable e) { > > Can you catch a more specific exception? Catching very general exceptions can often mask other bugs. I suppose this is only a benchmark, but it would still be good practice ;) Indeed. > test/micro/org/openjdk/bench/vm/compiler/MultiplyExact.java line 62: > >> 60: >> 61: @Fork(value = 1) >> 62: public static class C2 extends MultiplyExact {} > > What about a C2 version where you just disable the intrinsic? Good idea. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r1985004497 PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r1985003664 From dfenacci at openjdk.org Fri Mar 7 14:24:11 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 7 Mar 2025 14:24:11 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v4] In-Reply-To: References: Message-ID: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. > > # Testing > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) > > Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8302459: create method to prepend and reset generator ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21682/files - new: https://git.openjdk.org/jdk/pull/21682/files/e71e72f5..d688fcfe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=02-03 Stats: 16 lines in 2 files changed: 6 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21682/head:pull/21682 PR: https://git.openjdk.org/jdk/pull/21682 From duke at openjdk.org Fri Mar 7 14:29:24 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 14:29:24 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v8] In-Reply-To: References: Message-ID: > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: reworked another comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/14ee25d1..f8824096 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=06-07 Stats: 14 lines in 1 file changed: 8 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From duke at openjdk.org Fri Mar 7 14:29:24 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 14:29:24 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v5] In-Reply-To: <0-rhfBkyIO8uh5uioQ4XDoEWGwRdfzah8GJrYvILeDM=.699a73a4-ffd2-42f1-b7df-4c32235b3218@github.com> References: <0-rhfBkyIO8uh5uioQ4XDoEWGwRdfzah8GJrYvILeDM=.699a73a4-ffd2-42f1-b7df-4c32235b3218@github.com> Message-ID: <-0jWIP46p163vxnWPBy3FCXXYFJUGyTctjZh-r-p57Y=.40ac83bd-806d-49f5-9cbe-bcbed99ff06f@github.com> On Fri, 7 Mar 2025 12:59:22 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> + comment on why not zerocon > > src/hotspot/share/opto/mulnode.cpp line 980: > >> 978: // con0 is the rhs of outer_shift (since it's already computed in the callers) >> 979: // con0 is assumed to be masked already (as computed by maskShiftAmount) and non-zero >> 980: // bt must be T_LONG or T_INT. > > Suggestion: > > // We have: > // outer_shift = (_ << con0) > // We are looking for the pattern: > // outer_shift = (inner_shift << con0) > // outer_shift = ((x << con1) << con0) > // > // if con0 + con1 >= nbits => 0 > // if con0 + con1 < nbits => x << (con1 + con0) > // > // Note: con0 and con1 are both in [0..nbits), as they are computed by maskShiftAmount. > > `bt must be T_LONG or T_INT.` is already stated by the assert. > > This is just a suggestion, take/modify/ignore it as you wish ;) I took and modified slightly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985147031 From epeter at openjdk.org Fri Mar 7 14:32:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 14:32:16 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v3] In-Reply-To: References: Message-ID: > With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. > > But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: > - The loops disappear during IGVN. > - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. > - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. > - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) > > Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23943/files - new: https://git.openjdk.org/jdk/pull/23943/files/e852e377..563348e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23943&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23943&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23943.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23943/head:pull/23943 PR: https://git.openjdk.org/jdk/pull/23943 From thartmann at openjdk.org Fri Mar 7 14:32:16 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 7 Mar 2025 14:32:16 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v2] In-Reply-To: <3KvvUSvsuELDH07hemoqlDJNICAU5RfkrysTBOCwXjA=.6d1bf7a5-32d7-476e-8f87-dba083d0d3eb@github.com> References: <3KvvUSvsuELDH07hemoqlDJNICAU5RfkrysTBOCwXjA=.6d1bf7a5-32d7-476e-8f87-dba083d0d3eb@github.com> Message-ID: On Fri, 7 Mar 2025 12:20:05 GMT, Emanuel Peter wrote: >> With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. >> >> But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: >> - The loops disappear during IGVN. >> - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. >> - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. >> - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) >> >> Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > for Christian Looks good to me. test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java line 133: > 131: // Then the loops disappear during IGVN, and in the next loop-opts phase, the > 132: // OpaqueMultiversioning is marked useless, but then we already run > 133: // PhaseIdealLoop::conditional_move before the next IGVN round, and find a Suggestion: // PhaseIdealLoop::conditional_move before the next IGVN round, and find a ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23943#pullrequestreview-2667405463 PR Review Comment: https://git.openjdk.org/jdk/pull/23943#discussion_r1985147772 From epeter at openjdk.org Fri Mar 7 14:32:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 14:32:16 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v2] In-Reply-To: References: <3KvvUSvsuELDH07hemoqlDJNICAU5RfkrysTBOCwXjA=.6d1bf7a5-32d7-476e-8f87-dba083d0d3eb@github.com> Message-ID: On Fri, 7 Mar 2025 14:26:45 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> for Christian > > Looks good to me. @TobiHartmann Thanks for the review, I applied your suggestion :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23943#issuecomment-2706586623 From dfenacci at openjdk.org Fri Mar 7 14:49:06 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 7 Mar 2025 14:49:06 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v3] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 16:22:30 GMT, Vladimir Ivanov wrote: >> Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits: >> >> - JDK-8302459: unneeded changes >> - JDK-8302459: unneeded changes >> - JDK-8302459: update assert string >> - JDK-8302459: fix copyright year >> - JDK-8302459: fix after merge >> - Merge branch 'master' into JDK-8302459-new >> - JDK-8302459: add logging >> - JDK-8302459: remove todos >> - JDK-8302459: add check to avoid infinite loop >> - Merge branch 'master' into JDK-8302459-new >> - ... and 30 more: https://git.openjdk.org/jdk/compare/a637ccf2...e71e72f5 > > src/hotspot/share/opto/callnode.cpp line 1112: > >> 1110: "static call node changed: trying again"); >> 1111: } >> 1112: phase->C->prepend_late_inline(cg); > > There are 4 occurrences of `prepend_late_inline` followed by `set_generator(nullptr)`. Does it deserve a helper method? It surely does. I added it. > src/hotspot/share/opto/compile.cpp line 2044: > >> 2042: break; // process one call site at a time >> 2043: } else { >> 2044: if (C->igvn_worklist()->member(cg->call_node()) == is_scheduled_for_igvn_before) { // avoid potential infinite loop > > Can you remind me, please, what exactly we are trying to catch here? > I remember I expressed concerns about the call node being scheduled for IGVN during incremental inlining attempt causing infinite loop during incremental inlining. Does the same apply if the node disappears from IGVN work list during incremental inlining attempt? > > (It took me some time to recollect what's going on here. Maybe introduce `is_scheduled_for_igvn_after` local and add a comment why both mismatches - `false -> true` and `true -> false` - are problematic?) Thanks a lot for having a look @iwanowww! I took me a while to recollect it too (and I remember having a hard time figuring out if that could be an issue back then ?). Anyway the concern, as you said, was that there might be an infinite loop between IGVN and incremental inlining (presumably because during incremental inlining the call node could potentially slip back into the working list, right?). If that is the root of the problem, the issue would only exist in the `false -> true` case. In the (potential) `true -> false` case the call node wouldn't be scheduled for IGVN in the next round, so there wouldn't be any loop. Maybe we could even transform the statement into something like: if (C->igvn_worklist()->member(cg->call_node()) && is_scheduled_for_igvn_before) { cg->call_node()->set_generator(cg); } What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1985179449 PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1985179601 From thartmann at openjdk.org Fri Mar 7 14:55:02 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 7 Mar 2025 14:55:02 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v3] In-Reply-To: References: Message-ID: <1-Rwq1io_LcesOaIL6lrx8t38U4MaCpAmkHmZeuB7Sc=.cf3de367-543b-4a23-9cb3-a20475d294f7@github.com> On Fri, 7 Mar 2025 14:32:16 GMT, Emanuel Peter wrote: >> With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. >> >> But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: >> - The loops disappear during IGVN. >> - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. >> - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. >> - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) >> >> Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java > > Co-authored-by: Tobias Hartmann Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23943#pullrequestreview-2667494410 From duke at openjdk.org Fri Mar 7 16:11:36 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 16:11:36 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v9] In-Reply-To: References: Message-ID: <-ea_PqhfAG9kdSWC_MsAKeSjgfUazBWr9QB1f_gJUdo=.170e7391-65c3-4aa6-b2f4-779182997156@github.com> > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Random testing, trying... ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/f8824096..b07d0a2c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=07-08 Stats: 27 lines in 2 files changed: 27 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From duke at openjdk.org Fri Mar 7 16:11:36 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 16:11:36 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v6] In-Reply-To: References: <1qH9oxjV9Eb-dIRfROfHvsWs7K7_4pkFwDD19uJKQvA=.453734c9-897d-4938-83f8-0e5c71266f1c@github.com> Message-ID: On Fri, 7 Mar 2025 13:52:14 GMT, Emanuel Peter wrote: >> I can also do that, but I think using the constants is more valuable than only random numbers: most of random int or long won't have some corner properties I try to cover here. On top of that, with the comments, they help enumerating clearly different cases to look at. So, I'd be happy to add some randomization, but I strongly feel I should keep the constant ones as well. > > Absolutely, please keep the cases you already have, and add some randomized cases on top of it. > >> most of random int or long won't have some corner properties I try to cover here > > Right, that's why it is better to use `Generators.java` rather than just `Random`. With `Generators`, we make sure to generate more special values, such as powers of 2. And powers of 2 multiplication can for example be converted to shift, and that seems to be quite relevant here ;) I've tried something but... It seems too easy. With this way of doing CON0 and CON1, we only test 1 pair of constants at each run, no? I thought about a loop to cover ranges, but then, it's not constants for the compiler. So... is that how am I supposed to do? Just one pair, and the number of run of the same test will provide with time some coverage of the parameter space? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985333230 From epeter at openjdk.org Fri Mar 7 16:18:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 16:18:58 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v6] In-Reply-To: References: <1qH9oxjV9Eb-dIRfROfHvsWs7K7_4pkFwDD19uJKQvA=.453734c9-897d-4938-83f8-0e5c71266f1c@github.com> Message-ID: On Fri, 7 Mar 2025 16:08:43 GMT, Marc Chevalier wrote: >> Absolutely, please keep the cases you already have, and add some randomized cases on top of it. >> >>> most of random int or long won't have some corner properties I try to cover here >> >> Right, that's why it is better to use `Generators.java` rather than just `Random`. With `Generators`, we make sure to generate more special values, such as powers of 2. And powers of 2 multiplication can for example be converted to shift, and that seems to be quite relevant here ;) > > I've tried something but... It seems too easy. With this way of doing CON0 and CON1, we only test 1 pair of constants at each run, no? I thought about a loop to cover ranges, but then, it's not constants for the compiler. > > So... is that how am I supposed to do? Just one pair, and the number of run of the same test will provide with time some coverage of the parameter space? I think just one pair is good enough. This is more of a long-term test, that just fuzzes away every time its run. Such tests have a surprising coverage eventually. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985342871 From sviswanathan at openjdk.org Fri Mar 7 16:35:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 7 Mar 2025 16:35:52 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 07:48:07 GMT, Jatin Bhateja wrote: >> Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. >> >> Review comments are welcome. >> >> Best Regards, >> Sandhya > > test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 62: > >> 60: } >> 61: >> 62: @Test > > Suggestion: > > @Test > @IR(failOn = { IRNode.VECTOR_CAST_HF2F }, applyIfCPUFeatureOr = { "avx512vl", "true", "f16c", "true" }) I left out the IR check because we do intend to vectorize this going forward. Instead the bug fix is verified by checkResult. Also the fix is not specific to Intel platform so if we do add IR check it will need to be generic. @eme64 your thoughts please? Would you like to see an IR check here that vectorization is not happening? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1985366520 From duke at openjdk.org Fri Mar 7 16:40:05 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 7 Mar 2025 16:40:05 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v5] In-Reply-To: <0-rhfBkyIO8uh5uioQ4XDoEWGwRdfzah8GJrYvILeDM=.699a73a4-ffd2-42f1-b7df-4c32235b3218@github.com> References: <0-rhfBkyIO8uh5uioQ4XDoEWGwRdfzah8GJrYvILeDM=.699a73a4-ffd2-42f1-b7df-4c32235b3218@github.com> Message-ID: On Fri, 7 Mar 2025 12:41:06 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> + comment on why not zerocon > > src/hotspot/share/opto/memnode.cpp line 3577: > >> 3575: if (val->Opcode() == Op_RShiftI) { >> 3576: const TypeInt* conIR = phase->type(val->in(2))->isa_int(); >> 3577: if (conIR != nullptr && conIR->is_con() && (conIR->get_con() <= num_rejected_bits)) { > > Can you say why you need `conIR->get_con() <= num_rejected_bits` in a comment? Explained. It deserved a drawing. > src/hotspot/share/opto/mulnode.cpp line 983: > >> 981: static Node* collapse_nested_shift_left(PhaseGVN* phase, Node* outer_shift, int con0, BasicType bt) { >> 982: assert(bt == T_LONG || bt == T_INT, "Unexpected type"); >> 983: int nbits = bt == T_LONG ? BitsPerJavaLong : BitsPerJavaInteger; > > Roland is introducing a new method for this in `https://github.com/openjdk/jdk/pull/23438`, see `bits_per_java_integer`. I suggest you use it too ;) Happily, as soon as this other PR is merged! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985371538 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1985370818 From sviswanathan at openjdk.org Fri Mar 7 16:45:51 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 7 Mar 2025 16:45:51 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input In-Reply-To: References: Message-ID: <1TIUz8O4xjCwMArQYWlPJ4qRR9SBVpux0cceH9m2X5k=.521532a4-9ea7-4031-aa98-a60ce2c8982a@github.com> On Fri, 7 Mar 2025 05:25:44 GMT, Vladimir Kozlov wrote: >> Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. >> >> Review comments are welcome. >> >> Best Regards, >> Sandhya > > Good. Thanks a lot @vnkozlov for the review and approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23939#issuecomment-2706922534 From jbhateja at openjdk.org Fri Mar 7 17:42:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 7 Mar 2025 17:42:07 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value Message-ID: Hi All, This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value Changes: https://git.openjdk.org/jdk/pull/23947/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350896 Stats: 163 lines in 3 files changed: 145 ins; 15 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23947/head:pull/23947 PR: https://git.openjdk.org/jdk/pull/23947 From duke at openjdk.org Fri Mar 7 17:57:05 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 7 Mar 2025 17:57:05 GMT Subject: RFR: 8350194: Last 2 parameters of ReturnNode::ReturnNode are swapped in the declaration Message-ID: The last two parameters in the declaration of ReturnNode::ReturnNode, `frameptr` and `retadr` were swapped in the declaration compared to the definition. This commit makes the declaration consistent with the definition and the two usages in [`GraphKit::gen_stub()`](https://github.com/openjdk/jdk/blob/5c552a9d64c8116161cb9ef4c777e75a2602a75b/src/hotspot/share/opto/generateOptoStub.cpp#L267) and [`Compile::return_values()`](https://github.com/openjdk/jdk/blob/5c552a9d64c8116161cb9ef4c777e75a2602a75b/src/hotspot/share/opto/parse1.cpp#L879). Tests: tiers 1 through 3 passed. ------------- Commit messages: - Fix formatting - opto/callnode: fix parameter order of ReturnNode::ReturnNode Changes: https://git.openjdk.org/jdk/pull/23927/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23927&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350194 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23927.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23927/head:pull/23927 PR: https://git.openjdk.org/jdk/pull/23927 From thartmann at openjdk.org Fri Mar 7 17:57:05 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 7 Mar 2025 17:57:05 GMT Subject: RFR: 8350194: Last 2 parameters of ReturnNode::ReturnNode are swapped in the declaration In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 07:42:14 GMT, Manuel H?ssig wrote: > The last two parameters in the declaration of ReturnNode::ReturnNode, `frameptr` and `retadr` were swapped in the declaration compared to the definition. This commit makes the declaration consistent with the definition and the two usages in [`GraphKit::gen_stub()`](https://github.com/openjdk/jdk/blob/5c552a9d64c8116161cb9ef4c777e75a2602a75b/src/hotspot/share/opto/generateOptoStub.cpp#L267) and [`Compile::return_values()`](https://github.com/openjdk/jdk/blob/5c552a9d64c8116161cb9ef4c777e75a2602a75b/src/hotspot/share/opto/parse1.cpp#L879). > > Tests: tiers 1 through 3 passed. That looks good and trivial to me. Congratulations on your first PR Manuel! :partying_face: ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23927#pullrequestreview-2663601283 From epeter at openjdk.org Fri Mar 7 17:57:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Mar 2025 17:57:05 GMT Subject: RFR: 8350194: Last 2 parameters of ReturnNode::ReturnNode are swapped in the declaration In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 07:42:14 GMT, Manuel H?ssig wrote: > The last two parameters in the declaration of ReturnNode::ReturnNode, `frameptr` and `retadr` were swapped in the declaration compared to the definition. This commit makes the declaration consistent with the definition and the two usages in [`GraphKit::gen_stub()`](https://github.com/openjdk/jdk/blob/5c552a9d64c8116161cb9ef4c777e75a2602a75b/src/hotspot/share/opto/generateOptoStub.cpp#L267) and [`Compile::return_values()`](https://github.com/openjdk/jdk/blob/5c552a9d64c8116161cb9ef4c777e75a2602a75b/src/hotspot/share/opto/parse1.cpp#L879). > > Tests: tiers 1 through 3 passed. Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23927#pullrequestreview-2667505750 From vlivanov at openjdk.org Fri Mar 7 18:06:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Mar 2025 18:06:56 GMT Subject: RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: <_3l8ylsbgvsqQE1Ihp0BUAx2o_VzcS6R2jWBSKW9u1E=.0dcb6086-ff6f-4c9a-b990-6665a476a3dc@github.com> On Wed, 5 Mar 2025 12:56:48 GMT, Marc Chevalier wrote: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... Nice benchmark, Marc! src/hotspot/share/opto/library_call.cpp line 1963: > 1961: set_i_o(i_o()); > 1962: > 1963: uncommon_trap(Deoptimization::Reason_intrinsic, What about using `builtin_throw` here? (Requires some tuning on `builtin_throw` side.) How much does it affect performance? Also, passing `must_throw = true` into `uncommon_trap` may help a bit here as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2667969834 PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r1985476888 From kvn at openjdk.org Fri Mar 7 18:08:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Mar 2025 18:08:55 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v3] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 14:32:16 GMT, Emanuel Peter wrote: >> With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. >> >> But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: >> - The loops disappear during IGVN. >> - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. >> - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. >> - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) >> >> Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java > > Co-authored-by: Tobias Hartmann Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23943#pullrequestreview-2667972422 From vlivanov at openjdk.org Fri Mar 7 18:08:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Mar 2025 18:08:56 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v3] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 14:32:16 GMT, Emanuel Peter wrote: >> With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. >> >> But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: >> - The loops disappear during IGVN. >> - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. >> - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. >> - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) >> >> Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java > > Co-authored-by: Tobias Hartmann Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23943#pullrequestreview-2667974712 From kvn at openjdk.org Fri Mar 7 18:50:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Mar 2025 18:50:53 GMT Subject: RFR: 8348261: assert(n->is_Mem()) failed: memory node required In-Reply-To: References: Message-ID: <105aN1RJ-Jjy8-BZpdst4EtorZsN82x6w4Vqzh2cBdQ=.a099ffce-bf50-4e32-8ad1-b882c3b01b3f@github.com> On Fri, 7 Mar 2025 00:44:44 GMT, Vladimir Kozlov wrote: > Add missing check for StrInflatedCopy intrinsic in C2 Escape Analysis. > > Very rare case since we not usually use Latin1.inflate(). In failing case we inline both paths in [String.getBytes()](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L4808) and eliminate `TreeMap$EntryIterator allocation: > > > # java.lang.String::getBytes @ bci:40 (line 4812) L[0]=rsp + #64 L[1]=rsp + #132 L[2]=rsp + #120 L[3]=rsp + #124 STK[0]=rsp + #128 STK[1]=#0 STK[2]=rsp + #132 STK[3]=rsp + #120 STK[4]=rsp + #164 > # java.lang.AbstractStringBuilder::putStringAt @ bci:15 (line 1754) L[0]=rsp + #0 L[1]=rsp + #120 L[2]=rsp + #64 > # java.lang.AbstractStringBuilder::append @ bci:30 (line 592) L[0]=rsp + #0 L[1]=rsp + #64 L[2]=rsp + #28 > # java.lang.StringBuilder::append @ bci:2 (line 179) L[0]=rsp + #0 L[1]=rsp + #64 > # java.lang.StringBuilder::append @ bci:5 (line 173) L[0]=rsp + #0 L[1]=rsp + #32 > # sun.util.locale.LocaleExtensions::toID @ bci:100 (line 206) L[0]=RBP L[1]=rsp + #0 L[2]=rsp + #8 L[3]=#ScObj0 L[4]=rsp + #16 L[5]=rsp + #24 L[6]=rsp + #32 > # ScObj0 java/util/TreeMap$EntryIterator={ [expectedModCount :0]=rsp + #140, [next :1]=rsp + #176, [lastReturned :2]=rsp + #16, [this$0 :3]=rsp + #168 } > > > Unfortunately I was not able to create standalone test - it seems requires very particular frequencies of executed paths and used features/flags. The fix was verified with compilation replay file from the bug report. > > I am running testing and will let you know results. Thank you, Christian and Emanuel for reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/23938#issuecomment-2707175188 From kvn at openjdk.org Fri Mar 7 19:07:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Mar 2025 19:07:54 GMT Subject: RFR: 8348261: assert(n->is_Mem()) failed: memory node required In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 09:13:33 GMT, Emanuel Peter wrote: >> Add missing check for StrInflatedCopy intrinsic in C2 Escape Analysis. >> >> Very rare case since we not usually use Latin1.inflate(). In failing case we inline both paths in [String.getBytes()](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L4808) and eliminate `TreeMap$EntryIterator allocation: >> >> >> # java.lang.String::getBytes @ bci:40 (line 4812) L[0]=rsp + #64 L[1]=rsp + #132 L[2]=rsp + #120 L[3]=rsp + #124 STK[0]=rsp + #128 STK[1]=#0 STK[2]=rsp + #132 STK[3]=rsp + #120 STK[4]=rsp + #164 >> # java.lang.AbstractStringBuilder::putStringAt @ bci:15 (line 1754) L[0]=rsp + #0 L[1]=rsp + #120 L[2]=rsp + #64 >> # java.lang.AbstractStringBuilder::append @ bci:30 (line 592) L[0]=rsp + #0 L[1]=rsp + #64 L[2]=rsp + #28 >> # java.lang.StringBuilder::append @ bci:2 (line 179) L[0]=rsp + #0 L[1]=rsp + #64 >> # java.lang.StringBuilder::append @ bci:5 (line 173) L[0]=rsp + #0 L[1]=rsp + #32 >> # sun.util.locale.LocaleExtensions::toID @ bci:100 (line 206) L[0]=RBP L[1]=rsp + #0 L[2]=rsp + #8 L[3]=#ScObj0 L[4]=rsp + #16 L[5]=rsp + #24 L[6]=rsp + #32 >> # ScObj0 java/util/TreeMap$EntryIterator={ [expectedModCount :0]=rsp + #140, [next :1]=rsp + #176, [lastReturned :2]=rsp + #16, [this$0 :3]=rsp + #168 } >> >> >> Unfortunately I was not able to create standalone test - it seems requires very particular frequencies of executed paths and used features/flags. The fix was verified with compilation replay file from the bug report. >> >> I am running testing and will let you know results. > > src/hotspot/share/opto/escape.cpp line 4725: > >> 4723: } else { >> 4724: #ifdef ASSERT >> 4725: if (!n->is_Mem()) { > > Would it make sense to turn this into a product check? Could we bail-out gracefully at this point? For first question - may be but we hit it first time since JDK 9. I think we should balance what is really needs to be checked in product and what can be done in debug VM. For this case I think checking in debug is fine. For second question - not from EA but we can from compilation. We are in `split_unique_types()` where we start modifying graph already. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23938#discussion_r1985557362 From kvn at openjdk.org Fri Mar 7 19:15:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Mar 2025 19:15:52 GMT Subject: RFR: 8348261: assert(n->is_Mem()) failed: memory node required In-Reply-To: References: Message-ID: <_uet7A5kt9RXIfQ4SV2xtqbbfvzExNHIG9QlKpLpUyQ=.7d708ac1-7b40-4f11-8629-041b20628624@github.com> On Fri, 7 Mar 2025 09:12:20 GMT, Emanuel Peter wrote: > Looks reasonable. > > Is there maybe some kind of stress-flag we could develop? Because you are saying the reproduction depends on some specific probabilities. I think stressing flag `-XX:+StressUnstableIfTraps` added by Tobias may help here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23938#issuecomment-2707219159 From kvn at openjdk.org Fri Mar 7 19:21:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Mar 2025 19:21:57 GMT Subject: RFR: 8348261: assert(n->is_Mem()) failed: memory node required In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 00:44:44 GMT, Vladimir Kozlov wrote: > Add missing check for StrInflatedCopy intrinsic in C2 Escape Analysis. > > Very rare case since we not usually use Latin1.inflate(). In failing case we inline both paths in [String.getBytes()](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L4808) and eliminate `TreeMap$EntryIterator allocation: > > > # java.lang.String::getBytes @ bci:40 (line 4812) L[0]=rsp + #64 L[1]=rsp + #132 L[2]=rsp + #120 L[3]=rsp + #124 STK[0]=rsp + #128 STK[1]=#0 STK[2]=rsp + #132 STK[3]=rsp + #120 STK[4]=rsp + #164 > # java.lang.AbstractStringBuilder::putStringAt @ bci:15 (line 1754) L[0]=rsp + #0 L[1]=rsp + #120 L[2]=rsp + #64 > # java.lang.AbstractStringBuilder::append @ bci:30 (line 592) L[0]=rsp + #0 L[1]=rsp + #64 L[2]=rsp + #28 > # java.lang.StringBuilder::append @ bci:2 (line 179) L[0]=rsp + #0 L[1]=rsp + #64 > # java.lang.StringBuilder::append @ bci:5 (line 173) L[0]=rsp + #0 L[1]=rsp + #32 > # sun.util.locale.LocaleExtensions::toID @ bci:100 (line 206) L[0]=RBP L[1]=rsp + #0 L[2]=rsp + #8 L[3]=#ScObj0 L[4]=rsp + #16 L[5]=rsp + #24 L[6]=rsp + #32 > # ScObj0 java/util/TreeMap$EntryIterator={ [expectedModCount :0]=rsp + #140, [next :1]=rsp + #176, [lastReturned :2]=rsp + #16, [this$0 :3]=rsp + #168 } > > > Unfortunately I was not able to create standalone test - it seems requires very particular frequencies of executed paths and used features/flags. The fix was verified with compilation replay file from the bug report. > > I am running testing and will let you know results. My testing tier1-5, xcomp and stress passed without new failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23938#issuecomment-2707225759 From kvn at openjdk.org Fri Mar 7 19:21:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Mar 2025 19:21:57 GMT Subject: Integrated: 8348261: assert(n->is_Mem()) failed: memory node required In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 00:44:44 GMT, Vladimir Kozlov wrote: > Add missing check for StrInflatedCopy intrinsic in C2 Escape Analysis. > > Very rare case since we not usually use Latin1.inflate(). In failing case we inline both paths in [String.getBytes()](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L4808) and eliminate `TreeMap$EntryIterator allocation: > > > # java.lang.String::getBytes @ bci:40 (line 4812) L[0]=rsp + #64 L[1]=rsp + #132 L[2]=rsp + #120 L[3]=rsp + #124 STK[0]=rsp + #128 STK[1]=#0 STK[2]=rsp + #132 STK[3]=rsp + #120 STK[4]=rsp + #164 > # java.lang.AbstractStringBuilder::putStringAt @ bci:15 (line 1754) L[0]=rsp + #0 L[1]=rsp + #120 L[2]=rsp + #64 > # java.lang.AbstractStringBuilder::append @ bci:30 (line 592) L[0]=rsp + #0 L[1]=rsp + #64 L[2]=rsp + #28 > # java.lang.StringBuilder::append @ bci:2 (line 179) L[0]=rsp + #0 L[1]=rsp + #64 > # java.lang.StringBuilder::append @ bci:5 (line 173) L[0]=rsp + #0 L[1]=rsp + #32 > # sun.util.locale.LocaleExtensions::toID @ bci:100 (line 206) L[0]=RBP L[1]=rsp + #0 L[2]=rsp + #8 L[3]=#ScObj0 L[4]=rsp + #16 L[5]=rsp + #24 L[6]=rsp + #32 > # ScObj0 java/util/TreeMap$EntryIterator={ [expectedModCount :0]=rsp + #140, [next :1]=rsp + #176, [lastReturned :2]=rsp + #16, [this$0 :3]=rsp + #168 } > > > Unfortunately I was not able to create standalone test - it seems requires very particular frequencies of executed paths and used features/flags. The fix was verified with compilation replay file from the bug report. > > I am running testing and will let you know results. This pull request has now been integrated. Changeset: f6a8db28 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/f6a8db289e5366845f9518fce7a98538017e9570 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod 8348261: assert(n->is_Mem()) failed: memory node required Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/23938 From vlivanov at openjdk.org Fri Mar 7 19:26:06 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Mar 2025 19:26:06 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v4] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 14:24:11 GMT, Damon Fenacci wrote: >> # Issue >> >> The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. >> >> # Cause >> >> The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. >> The graph that leads to the issue looks like this: >> ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) >> The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: >> ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) >> The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. >> The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. >> >> This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. >> >> # Solution >> >> Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. >> >> # Testing >> >> Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) >> >> Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8302459: create method to prepend and reset generator src/hotspot/share/opto/callnode.cpp line 1072: > 1070: #endif > 1071: > 1072: void CallJavaNode::prepend_and_reset_generator(PhaseGVN* phase, CallGenerator* cg) { `prepend_and_reset_generator` sounds way too verbose. Maybe `register_for_late_inline` instead? Also, `CallGenertaor* cg` argument is redundant. And `phase` is used just to extract `Compile*`. Either void CallJavaNode::register_for_late_inline(Compile* C) { if (generator() != nullptr) { C->prepend_late_inline(generator()); set_generator(nullptr); } else { assert(false, "repeated attempt"); } } Or even drop `Compile* C` and use `Compile::current()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1985576552 From kvn at openjdk.org Fri Mar 7 19:31:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Mar 2025 19:31:54 GMT Subject: RFR: 8351348: x86_64: remove redundant supports_float16 check In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 13:18:07 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > `supports_float16()` is invoked in `is_intrinsic_available -> is_intrinsic_supported`, so there is no need to call it explicitly. > > Thanks It does affect VM startup very "tiny-bit" ;^) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23932#issuecomment-2707243959 From mli at openjdk.org Fri Mar 7 19:42:58 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Mar 2025 19:42:58 GMT Subject: RFR: 8351348: x86_64: remove redundant supports_float16 check In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 19:29:05 GMT, Vladimir Kozlov wrote: > It does affect VM startup very "tiny-bit" ;^) OK, I'll close this one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23932#issuecomment-2707263607 From mli at openjdk.org Fri Mar 7 19:42:58 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Mar 2025 19:42:58 GMT Subject: Withdrawn: 8351348: x86_64: remove redundant supports_float16 check In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 13:18:07 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > `supports_float16()` is invoked in `is_intrinsic_available -> is_intrinsic_supported`, so there is no need to call it explicitly. > > Thanks This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23932 From qamai at openjdk.org Fri Mar 7 20:09:02 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 7 Mar 2025 20:09:02 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 22:47:58 GMT, Vladimir Ivanov wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Implement apply_identity > > Were there any experiments conducted to port existing lowering transformations to the new pass? > > As we discussed before, there are multiple places in the code where lowering takes place. It is still not clear to me how much proposed solution unifies across existing use cases. What I'd really like to avoid is yet another peculiar way to perform lowering transformations in C2. @iwanowww Gentle ping on this @jaskarth I think you should merge master to keep the branch reasonably up-to-date ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2707315180 From vlivanov at openjdk.org Fri Mar 7 20:44:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Mar 2025 20:44:54 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v3] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 14:45:46 GMT, Damon Fenacci wrote: >> src/hotspot/share/opto/compile.cpp line 2044: >> >>> 2042: break; // process one call site at a time >>> 2043: } else { >>> 2044: if (C->igvn_worklist()->member(cg->call_node()) == is_scheduled_for_igvn_before) { // avoid potential infinite loop >> >> Can you remind me, please, what exactly we are trying to catch here? >> I remember I expressed concerns about the call node being scheduled for IGVN during incremental inlining attempt causing infinite loop during incremental inlining. Does the same apply if the node disappears from IGVN work list during incremental inlining attempt? >> >> (It took me some time to recollect what's going on here. Maybe introduce `is_scheduled_for_igvn_after` local and add a comment why both mismatches - `false -> true` and `true -> false` - are problematic?) > > Thanks a lot for having a look @iwanowww! > > I took me a while to recollect it too (and I remember having a hard time figuring out if that could be an issue back then ?). Anyway the concern, as you said, was that there might be an infinite loop between IGVN and incremental inlining (presumably because during incremental inlining the call node could potentially slip back into the working list, right?). > If that is the root of the problem, the issue would only exist in the `false -> true` case. In the (potential) `true -> false` case the call node wouldn't be scheduled for IGVN in the next round, so there wouldn't be any loop. Maybe we could even transform the statement into something like: > > if (C->igvn_worklist()->member(cg->call_node()) && is_scheduled_for_igvn_before) { > cg->call_node()->set_generator(cg); > } > > What do you think? So, since current logic for generic (non-MH case) case conservatively assumes that any change in inputs may benefit inlining (and unconditionally schedules such call nodes for another inlining attempt during IGVN), we want to avoid the situation when call node gets scheduled for IGVN during failed inlining attempt. I'd shape it as follows: if (!is_scheduled_for_igvn_before && is_scheduled_for_igvn_after) { // avoid potential infinite loop assert(false, "scheduled for IGVN during inlining attempt"); } else { assert(is_scheduled_for_igvn_before == is_scheduled_for_igvn_after, "interesting"); // removed from IGVN list during inlining pass? cg->call_node()->set_generator(cg); // wait for another opportunity } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1985681022 From vlivanov at openjdk.org Fri Mar 7 20:55:59 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Mar 2025 20:55:59 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 02:23:30 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. >> >> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Implement apply_identity I still have a hard time making any conclusions until I see examples. Skeleton code doesn't say much to me. Also, would be nice to port some existing use cases. Overall, I'd like to build more confidence in general applicability of the proposed design before committing to it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2707432254 From qamai at openjdk.org Fri Mar 7 21:42:02 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 7 Mar 2025 21:42:02 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 20:53:34 GMT, Vladimir Ivanov wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Implement apply_identity > > I still have a hard time making any conclusions until I see examples. Skeleton code doesn't say much to me. > Also, would be nice to port some existing use cases. > > Overall, I'd like to build more confidence in general applicability of the proposed design before committing to it. @iwanowww There are some examples, most of these are about x86 since that is the architecture I'm most familiar with: #22922 The relative cost of multiplication to left shift and addition is different between each architecture and each data type. For example, on x86, scalar multiplication has the latency being triple of that for shift and addition, so transforming `x * 5` into `(x << 2) + x` is reasonable, while transforming `x * 13` into `(x << 3) + (x << 2) + x` is pretty questionable. However, vector multiplication is a different story, i32 vector multiplication has around 5 times the latency, and i64 vector multiplication is even more expensive. So it is preferable to be more aggressive with this transformation. The story is completely different for AArch64, so we need a completely different heuristic there. #22886 This is a PR taking advantage of this PR. In general, we try to lower the vector node early to take advantage of GVN. While if we try to implement the node in code emission there is no optimization there anymore. Some examples that I have given regarding vector insertion and vector extraction. The idea is the same, by expanding early, we can perform idealization and GVN on them, elide redundant nodes. Note that this transformation is only on x86: `ExtractI(v, 5) -> ExtractI(ExtractVector(v, 1), 1)` because the concept of 128-bit "lane" and the fact that scalar value can only interact with 128-bit vectors only exists there. https://bugs.openjdk.org/browse/JDK-8345812 The general concept of a vector rearrange is to shuffle one vector with the index from another vector. However, the underlying machine may not support such shuffles directly. In those cases, we need to emulate that shuffle with other shuffle instructions. For example, consider a shuffle of short vectors `[x0, x1, x2, x3]` and `[y0, y1, y2, y3]`. However, x86 does not have short shuffles before AVX512BW, and it has a byte shuffle, so we transform the index vector into something that when we invoke the byte shuffle using the `x` and the transformed `y`, the result would be as if we have a short shuffle instruction to begin with. This is only done early because an index vector is often used for multiple shuffles with different first operands. And we want to do it reasonably late so that we can transform other things into vector rearrange without having to deal with `VectorLoadShuffleNode`. https://bugs.openjdk.org/browse/JDK-8351434 The slice operation is a vector rearrange with an index vector in the form of `[c, c + 1, c + 2, ...]`. The machine may often have efficient instructions to execute them. As a result, with lowering, we can easily and elegantly transform a general-purpose rearrange into a more efficient slice instruction. Semi-related, there are a lot of shuffle instructions for different use cases, such as int shuffle with constant index, zipping 2 vectors, in-lane shuffle (all elements are shuffled to the same 128-bit lane), and all of them are much more efficient than a full general shuffle instruction. Many other nodes are expanded during code emission, it would be better to expand them during lowering instead. These include `Max/Min` nodes, many vector nodes, `Conv2B`, etc. For why it would be suboptimal to do these during other phases, I have expanded on it before, please read the previous comment. Cheers, Quan Anh ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2707509733 From chagedorn at openjdk.org Fri Mar 7 22:24:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 7 Mar 2025 22:24:55 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v3] In-Reply-To: References: Message-ID: <9FoiPNhI-DdO7Fy0CKk2Goh9gzN1_l9PfKOdATU5yjk=.c9f26f31-87e2-4021-8c2b-0558fe0cb3b6@github.com> On Fri, 7 Mar 2025 14:32:16 GMT, Emanuel Peter wrote: >> With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. >> >> But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: >> - The loops disappear during IGVN. >> - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. >> - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. >> - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) >> >> Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java > > Co-authored-by: Tobias Hartmann Thanks for the update, looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23943#pullrequestreview-2668513677 From dean.long at oracle.com Sat Mar 8 03:24:23 2025 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 7 Mar 2025 19:24:23 -0800 Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: <12XVxX3jh5YAN_AXwNYOLJKZfPpmogU_Dcg36Vk7m40=.594da04d-9a08-4856-8934-a6a030cfe1b0@github.com> References: <12XVxX3jh5YAN_AXwNYOLJKZfPpmogU_Dcg36Vk7m40=.594da04d-9a08-4856-8934-a6a030cfe1b0@github.com> Message-ID: <4ccbfd10-b811-40aa-99b1-ef98d9a70b80@oracle.com> On 3/6/25 2:33 PM, Evgeny Astigeevich wrote: > On Neoverse:https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/neoverse-v1-platform-a-new-performance-tier-for-arm: On Neoverse, what's the size of a region, and why must it split the code into separate regions at all? dl From duke at openjdk.org Sat Mar 8 16:00:53 2025 From: duke at openjdk.org (Abdelhak Zaaim) Date: Sat, 8 Mar 2025 16:00:53 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v3] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 14:32:16 GMT, Emanuel Peter wrote: >> With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. >> >> But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: >> - The loops disappear during IGVN. >> - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. >> - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. >> - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) >> >> Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java > > Co-authored-by: Tobias Hartmann Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/23943#pullrequestreview-2669217021 From mdoerr at openjdk.org Sat Mar 8 18:45:31 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 8 Mar 2025 18:45:31 GMT Subject: RFR: 8351456: Build failure with --disable-jvm-feature-shenandoahgc after 8343468 Message-ID: Fix Shenandoah exclusion. Tested on aarch64, not on riscv. ------------- Commit messages: - 8351456: Build failure with --disable-jvm-feature-shenandoahgc after 8343468 Changes: https://git.openjdk.org/jdk/pull/23955/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23955&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351456 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23955.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23955/head:pull/23955 PR: https://git.openjdk.org/jdk/pull/23955 From kvn at openjdk.org Sun Mar 9 02:03:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 9 Mar 2025 02:03:08 GMT Subject: RFR: 8351456: Build failure with --disable-jvm-feature-shenandoahgc after 8343468 In-Reply-To: References: Message-ID: On Sat, 8 Mar 2025 18:41:18 GMT, Martin Doerr wrote: > Fix Shenandoah exclusion. Tested on aarch64, not on riscv. Good. Urgent and trivial - no need second review. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23955#pullrequestreview-2669299009 From kvn at openjdk.org Sun Mar 9 02:06:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 9 Mar 2025 02:06:07 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Thu, 6 Mar 2025 12:15:52 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache. >> >> OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data). >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - swap matadata and jvmci data in outputs according to data layout > - cleanup > - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup > - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description > - add a separate adrp_movk function to to support targets located more than 4GB away > - Force the use of movk in combination with adrp and ldr instructions to address scenarios > where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp > - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: > _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. > Fix: use _oops_size int16 field to calculate metadata offset > - removing dead code > - a bit of cleanup and addressing review suggestions > - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup > - ... and 5 more: https://git.openjdk.org/jdk/compare/cfab88b1...bc8c590c My testing tier1-7, stress, comp passed with one new failure [JDK-8351457](https://bugs.openjdk.org/browse/JDK-8351457) which is not related I think. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21276#pullrequestreview-2669299259 From syan at openjdk.org Sun Mar 9 02:57:58 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 9 Mar 2025 02:57:58 GMT Subject: RFR: 8351456: Build failure with --disable-jvm-feature-shenandoahgc after 8343468 In-Reply-To: References: Message-ID: On Sat, 8 Mar 2025 18:41:18 GMT, Martin Doerr wrote: > Fix Shenandoah exclusion. Tested on aarch64, not on riscv. I verified both on linux-aarch64(native build) and linux-riscv64(cross build) with --disable-jvm-feature-shenandoahgc, after apply the patch of this PR build success. ------------- Marked as reviewed by syan (Committer). PR Review: https://git.openjdk.org/jdk/pull/23955#pullrequestreview-2669304856 From mdoerr at openjdk.org Sun Mar 9 16:17:59 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Sun, 9 Mar 2025 16:17:59 GMT Subject: RFR: 8351456: Build failure with --disable-jvm-feature-shenandoahgc after 8343468 In-Reply-To: References: Message-ID: On Sat, 8 Mar 2025 18:41:18 GMT, Martin Doerr wrote: > Fix Shenandoah exclusion. Tested on aarch64, not on riscv. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23955#issuecomment-2708940250 From mdoerr at openjdk.org Sun Mar 9 16:17:59 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Sun, 9 Mar 2025 16:17:59 GMT Subject: Integrated: 8351456: Build failure with --disable-jvm-feature-shenandoahgc after 8343468 In-Reply-To: References: Message-ID: On Sat, 8 Mar 2025 18:41:18 GMT, Martin Doerr wrote: > Fix Shenandoah exclusion. Tested on aarch64, not on riscv. This pull request has now been integrated. Changeset: 857c5371 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/857c53718957283766f6566e5519ab5911cf9f3c Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8351456: Build failure with --disable-jvm-feature-shenandoahgc after 8343468 Reviewed-by: kvn, syan ------------- PR: https://git.openjdk.org/jdk/pull/23955 From dnsimon at openjdk.org Sun Mar 9 19:12:34 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 9 Mar 2025 19:12:34 GMT Subject: RFR: 8346825: [JVMCI] Remove NativeImageReinitialize annotationremoved NativeImageReinitialize annotation Message-ID: The `jdk.vm.ci.common.NativeImageReinitialize` annotation was introduced to reset JVMCI and Graal fields to their default values as they are copied into the libgraal image. Now that class loader separation is used to isolate the JVMCI and Graal classes compiled to produce libgraal from the JVMCI and Graal classes being executed to do the AOT compilation, the need for this field resetting is no longer needed. This PR removes the `NativeImageReinitialize` annotation. ------------- Commit messages: - removed NativeImageReinitialize annotation Changes: https://git.openjdk.org/jdk/pull/23957/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23957&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346825 Stats: 69 lines in 10 files changed: 0 ins; 44 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/23957.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23957/head:pull/23957 PR: https://git.openjdk.org/jdk/pull/23957 From bulasevich at openjdk.org Mon Mar 10 02:30:08 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 10 Mar 2025 02:30:08 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v11] In-Reply-To: <1wpYPmDFmxBZ5rz947YDVXsYqPCcsQ1lC5GXd7O6SIA=.b0c00bc8-905c-4484-bd1c-b1f6b194fdbd@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> <4qam3fEKtXq-7w2fYkhuojgDE73_60todL54yQPhkbQ=.fb1b5c06-73f4-44de-8d78-c26281f2761b@github.com> <1wpYPmDFmxBZ5rz947YDVXsYqPCcsQ1lC5GXd7O6SIA=.b0c00bc8-905c-4484-bd1c-b1f6b194fdbd@github.com> Message-ID: On Fri, 21 Feb 2025 01:39:37 GMT, Dean Long wrote: >> Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description >> - add a separate adrp_movk function to to support targets located more than 4GB away >> - Force the use of movk in combination with adrp and ldr instructions to address scenarios >> where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp >> - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: >> _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. >> Fix: use _oops_size int16 field to calculate metadata offset >> - removing dead code >> - a bit of cleanup and addressing review suggestions >> - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup >> - remove _code_end_offset >> - update jvm.hotspot.code.CodeBlob class >> - update: mutable data for all CodeBlobs with relocations >> - ... and 2 more: https://git.openjdk.org/jdk/compare/e1d0a9c8...6c3370be > > Wouldn't most adrp+movk instructions for oops being computing the same or nearby base addresses? We could set up a dedicated base pointer to the external oop table at the beginning of the code, then use ldr $oop_table + offset for each oop reference. Or instead of a reserving a dedicated register that can't be used for anything else, we could allocate a regular spillable register, at the cost of worse performance if it needed to be spilled. Hi @dean-long, Would you mind doing a re-review of this PR? I have reverted the movement of oops into a separate buffer, as it caused issues on AArch. All platform-specific details are now removed, making the change much simpler. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2709275653 From never at openjdk.org Mon Mar 10 02:46:03 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 10 Mar 2025 02:46:03 GMT Subject: RFR: 8346825: [JVMCI] Remove NativeImageReinitialize annotationremoved NativeImageReinitialize annotation In-Reply-To: References: Message-ID: On Sun, 9 Mar 2025 19:07:54 GMT, Doug Simon wrote: > The `jdk.vm.ci.common.NativeImageReinitialize` annotation was introduced to reset JVMCI and Graal fields to their default values as they are copied into the libgraal image. Now that class loader separation is used to isolate the JVMCI and Graal classes compiled to produce libgraal from the JVMCI and Graal classes being executed to do the AOT compilation, the need for this field resetting is no longer needed. This PR removes the `NativeImageReinitialize` annotation. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23957#pullrequestreview-2669672002 From duke at openjdk.org Mon Mar 10 04:01:05 2025 From: duke at openjdk.org (duke) Date: Mon, 10 Mar 2025 04:01:05 GMT Subject: Withdrawn: 8346236: Auto vectorization support for various Float16 operations In-Reply-To: References: Message-ID: On Sun, 15 Dec 2024 20:06:40 GMT, Jatin Bhateja wrote: > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22755 From jbhateja at openjdk.org Mon Mar 10 06:25:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 10 Mar 2025 06:25:38 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v2] In-Reply-To: References: Message-ID: > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Updating benchmark - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Updating copyright - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Add MinVHF/MaxVHF to commutative op list - Auto Vectorization support for Float16 operations. ------------- Changes: https://git.openjdk.org/jdk/pull/22755/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=01 Stats: 864 lines in 16 files changed: 801 ins; 10 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From chagedorn at openjdk.org Mon Mar 10 07:59:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Mar 2025 07:59:58 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v2] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 10:53:45 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this trivial patch? >> Currently, it only reports that some flag is non-whitelisted, but does not print out the flag explicitly, but it's helpful to do so. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 602: > 600: boolean debugTest, irTest, nonWhiteListedTest; > 601: > 602: debugTest = Platform.isDebugBuild() && !Platform.isInt() && !Platform.isComp(); I suggest to split this up into three separate checks which generate three separate messages. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 615: > 613: System.out.println("IR verification disabled due to:"); > 614: if (!debugTest) { > 615: System.out.println("\tnot running a debug build (required for PrintIdeal and PrintOptoAssembly), " + I like the improvements so far. But I think we should not use tabs since the size could be specific to the machine we are running on. I suggest to use `-` instead which could use the following structure: IR verification disabled due to the following reason(s): - Reason 1 - Reason 2 ... - Using non-whitelisted JTreg VM or Javaoptions flag(s): - Non-whitelisted flag 1 - Non-whitelisted flag 2 - ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23931#discussion_r1986765726 PR Review Comment: https://git.openjdk.org/jdk/pull/23931#discussion_r1986769438 From epeter at openjdk.org Mon Mar 10 09:47:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 09:47:59 GMT Subject: RFR: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning [v3] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 14:32:16 GMT, Emanuel Peter wrote: >> With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. >> >> But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: >> - The loops disappear during IGVN. >> - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. >> - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. >> - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) >> >> Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionRemoveUselessSlowLoop.java > > Co-authored-by: Tobias Hartmann Wow amazing, I got 5 reviews ? Thanks @vnkozlov @iwanowww @TobiHartmann @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23943#issuecomment-2709977972 From epeter at openjdk.org Mon Mar 10 09:47:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 09:47:59 GMT Subject: Integrated: 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 10:35:58 GMT, Emanuel Peter wrote: > With https://github.com/openjdk/jdk/pull/23865 I mark the `OpaqueMultiversioning` useless once there are no loops for the multiversioning any more. The idea was that this way the `multiversion_if` would be folded away once the loops are gone, and then we should not encounter the `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`. > > But there is a case where we do not remove the `OpaqueMultiversioning` fast enough, see the attached regression test: > - The loops disappear during IGVN. > - At the beginning of the next loop-opts we mark the `OpaqueMultiversioning` as useless. > - Later during loop-opts we encounter the useless `OpaqueMultiversioning` in `PhaseIdealLoop::conditional_move`, and assert. > - But in the IGVN after this loop-opts phase we would have constant folded the `OpaqueMultiversioning` and `multiversion_if` anyway... we just did not get there fast enough ;) > > Hence I propose to just create an explicit bailout for useless `OpaqueMultiversioning` nodes in `PhaseIdealLoop::conditional_move`. This pull request has now been integrated. Changeset: 19b9f11c Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/19b9f11c2ec37ef115c14adcfc31161786d46e95 Stats: 33 lines in 3 files changed: 33 ins; 0 del; 0 mod 8351392: C2 crash: failed: Expected Bool, but got OpaqueMultiversioning Reviewed-by: thartmann, kvn, vlivanov, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/23943 From dfenacci at openjdk.org Mon Mar 10 09:49:26 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 10 Mar 2025 09:49:26 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v5] In-Reply-To: References: Message-ID: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. > > # Testing > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) > > Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - JDK-8302459: refactor helper method - JDK-8302459: reshape infinite loop check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21682/files - new: https://git.openjdk.org/jdk/pull/21682/files/d688fcfe..9406c6e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=03-04 Stats: 18 lines in 3 files changed: 6 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21682/head:pull/21682 PR: https://git.openjdk.org/jdk/pull/21682 From dfenacci at openjdk.org Mon Mar 10 09:53:00 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 10 Mar 2025 09:53:00 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v4] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 19:23:13 GMT, Vladimir Ivanov wrote: > `prepend_and_reset_generator` sounds way too verbose. Maybe `register_for_late_inline` instead? ? > > Or even drop `Compile* C` and use `Compile::current()`. I got rid of all the arguments. Thanks Vladimir. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1986949527 From dfenacci at openjdk.org Mon Mar 10 09:53:00 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 10 Mar 2025 09:53:00 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v3] In-Reply-To: References: Message-ID: <3yI4U0797UkY12EdKml6fz8QldmjHluo-z6BU9sYRWI=.8a0bbf0e-aedf-4045-b078-11e4198e6257@github.com> On Fri, 7 Mar 2025 20:41:50 GMT, Vladimir Ivanov wrote: >> Thanks a lot for having a look @iwanowww! >> >> I took me a while to recollect it too (and I remember having a hard time figuring out if that could be an issue back then ?). Anyway the concern, as you said, was that there might be an infinite loop between IGVN and incremental inlining (presumably because during incremental inlining the call node could potentially slip back into the working list, right?). >> If that is the root of the problem, the issue would only exist in the `false -> true` case. In the (potential) `true -> false` case the call node wouldn't be scheduled for IGVN in the next round, so there wouldn't be any loop. Maybe we could even transform the statement into something like: >> >> if (C->igvn_worklist()->member(cg->call_node()) && is_scheduled_for_igvn_before) { >> cg->call_node()->set_generator(cg); >> } >> >> What do you think? > > So, since current logic for generic (non-MH case) case conservatively assumes that any change in inputs may benefit inlining (and unconditionally schedules such call nodes for another inlining attempt during IGVN), we want to avoid the situation when call node gets scheduled for IGVN during failed inlining attempt. > > I'd shape it as follows: > > if (!is_scheduled_for_igvn_before && is_scheduled_for_igvn_after) { // avoid potential infinite loop > assert(false, "scheduled for IGVN during inlining attempt"); > } else { > assert(is_scheduled_for_igvn_before == is_scheduled_for_igvn_after, "interesting"); // removed from IGVN list during inlining pass? > cg->call_node()->set_generator(cg); // wait for another opportunity > } Fair enough. I reshaped it following your suggestion. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1986949416 From duke at openjdk.org Mon Mar 10 10:03:54 2025 From: duke at openjdk.org (duke) Date: Mon, 10 Mar 2025 10:03:54 GMT Subject: RFR: 8350325: [PPC64] ConvF2HFIdealizationTests timeouts on Power8 In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 12:02:01 GMT, David Linus Briemann wrote: > Skip ConvF2HFIdealizationTests for Power8 @dbriemann Your change (at version b51dd1181fc04ace00fbec108e8713ff1e0cfa9b) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23692#issuecomment-2710027453 From epeter at openjdk.org Mon Mar 10 10:12:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 10:12:58 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly [v4] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 11:09:07 GMT, Christian Hagedorn wrote: >> This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). >> >> ### Update Assertion Predicate Killing Mechanism >> The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. >> >> #### Mark Opaque*AssertionPredicate` Nodes Useless >> Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: >> - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. >> - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. >> - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. >> - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. >> >> #### Update Predicate Iteration Code >> To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: >> - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. >> >> #### Other Updates >> I've also applied some small refactorings of touched code. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update mark_useless for OpaqueMultiversioning Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23941#pullrequestreview-2670415075 From chagedorn at openjdk.org Mon Mar 10 10:12:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Mar 2025 10:12:58 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly [v4] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 11:09:07 GMT, Christian Hagedorn wrote: >> This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). >> >> ### Update Assertion Predicate Killing Mechanism >> The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. >> >> #### Mark Opaque*AssertionPredicate` Nodes Useless >> Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: >> - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. >> - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. >> - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. >> - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. >> >> #### Update Predicate Iteration Code >> To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: >> - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. >> >> #### Other Updates >> I've also applied some small refactorings of touched code. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update mark_useless for OpaqueMultiversioning Thanks Emanuel for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23941#issuecomment-2710053884 From epeter at openjdk.org Mon Mar 10 10:15:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 10:15:59 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 01:56:49 GMT, Sandhya Viswanathan wrote: > Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. > > Review comments are welcome. > > Best Regards, > Sandhya Changes requested by epeter (Reviewer). test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 29: > 27: * @bug 8350835 > 28: * @summary Test bug fix for JDK-8350835 discovered through Template Framework > 29: * @requires vm.compiler2.enabled Suggestion: Is this restriction necessary? I generally prefer running tests on all platforms, and only restricting IR rules. That way we can get more test coverage with result verification. test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 31: > 29: * @requires vm.compiler2.enabled > 30: * @library /test/lib / > 31: * @run main/othervm -XX:-TieredCompilation -XX:CompileOnly=compiler.vectorization.TestFloat16ToFloatConv::test* compiler.vectorization.TestFloat16ToFloatConv Are the additional flags really necessary for reproducing the bug? I would suspect not really. The IR framework already takes care of ensuring we run with C2 compilation. test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 47: > 45: private static int[] aI = new int[SIZE]; > 46: private static long[] aL = new long[SIZE]; > 47: private static float[] goldB, goldS, goldI, goldL; Are you testing for `char` as well? test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 55: > 53: aI[i] = RANDOM.nextInt(); > 54: aL[i] = RANDOM.nextLong(); > 55: } I would prefer if we could start using `Generators`. There is a `fill` method for arrays. It generates more "interesting" values. It is not super relevant here, but it would be nice if we made this common practice now ;) ------------- PR Review: https://git.openjdk.org/jdk/pull/23939#pullrequestreview-2670202061 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1986858150 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1986860021 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1986861107 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1986863298 From epeter at openjdk.org Mon Mar 10 10:16:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 10:16:00 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input In-Reply-To: <1TIUz8O4xjCwMArQYWlPJ4qRR9SBVpux0cceH9m2X5k=.521532a4-9ea7-4031-aa98-a60ce2c8982a@github.com> References: <1TIUz8O4xjCwMArQYWlPJ4qRR9SBVpux0cceH9m2X5k=.521532a4-9ea7-4031-aa98-a60ce2c8982a@github.com> Message-ID: On Fri, 7 Mar 2025 16:43:20 GMT, Sandhya Viswanathan wrote: >> Good. > > Thanks a lot @vnkozlov for the review and approval. @sviswa7 thanks for looking at this! The fix looks good, there are just a few comments about the test :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23939#issuecomment-2710065337 From epeter at openjdk.org Mon Mar 10 10:16:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 10:16:01 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input In-Reply-To: References: Message-ID: <8BEb9jmlUKF-zofMQae3H13sxWO1NWSBSR-wXx42sk0=.f9f3e99f-8872-4364-a8ec-c74c823b2ea3@github.com> On Fri, 7 Mar 2025 16:33:32 GMT, Sandhya Viswanathan wrote: >> test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 62: >> >>> 60: } >>> 61: >>> 62: @Test >> >> Suggestion: >> >> @Test >> @IR(failOn = { IRNode.VECTOR_CAST_HF2F }, applyIfCPUFeatureOr = { "avx512vl", "true", "f16c", "true" }) > > I left out the IR check because we do intend to vectorize this going forward. Instead the bug fix is verified by checkResult. Also the fix is not specific to Intel platform so if we do add IR check it will need to be generic. > @eme64 your thoughts please? Would you like to see an IR check here that vectorization is not happening? Personally, I generally prefer to have `failOn` IR rules, if we expect that at least for now there should be no vectorization. But add a comment why we expect no vectorization, so that if it ever does vectorize and the IR rule fails the person has a hint, and does not have to reverse-engineer too much. And if it turns out that we should one day vectorize, then we already have all these tests ready to just flip the `failOn` into `count`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1986987697 From roland at openjdk.org Mon Mar 10 10:21:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 10 Mar 2025 10:21:11 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly [v4] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 11:09:07 GMT, Christian Hagedorn wrote: >> This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). >> >> ### Update Assertion Predicate Killing Mechanism >> The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. >> >> #### Mark Opaque*AssertionPredicate` Nodes Useless >> Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: >> - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. >> - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. >> - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. >> - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. >> >> #### Update Predicate Iteration Code >> To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: >> - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. >> >> #### Other Updates >> I've also applied some small refactorings of touched code. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update mark_useless for OpaqueMultiversioning Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23941#pullrequestreview-2670438915 From duke at openjdk.org Mon Mar 10 10:23:01 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 10 Mar 2025 10:23:01 GMT Subject: RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code In-Reply-To: <_3l8ylsbgvsqQE1Ihp0BUAx2o_VzcS6R2jWBSKW9u1E=.0dcb6086-ff6f-4c9a-b990-6665a476a3dc@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> <_3l8ylsbgvsqQE1Ihp0BUAx2o_VzcS6R2jWBSKW9u1E=.0dcb6086-ff6f-4c9a-b990-6665a476a3dc@github.com> Message-ID: On Fri, 7 Mar 2025 18:03:14 GMT, Vladimir Ivanov wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > src/hotspot/share/opto/library_call.cpp line 1963: > >> 1961: set_i_o(i_o()); >> 1962: >> 1963: uncommon_trap(Deoptimization::Reason_intrinsic, > > What about using `builtin_throw` here? (Requires some tuning on `builtin_throw` side.) How much does it affect performance? Also, passing `must_throw = true` into `uncommon_trap` may help a bit here as well. Using `builtin_throw` sounds nice! But indeed, it won't work so directly. I want to prevent intrinsic in case of `too_many_traps`. But that's only when `builtin_throw` will do something. But if I only rely on `builtin_throw`, then, when the built-in throwing is not possible (that is when `treat_throw_as_hot && method()->can_omit_stack_trace()` is false), we will have the repeated deopt again. There is also throwing the right exception, which is right now determined only by the reason (which adapts poorly to this case). I guess that's what you meant by tuning: be able to know if we would built-in throw, and if so, do it, otherwise, prevent infinitely repeated deopt. The way I see doing that is by (maybe optionally) providing the preallocated exception to throw as a parameter so that we don't have to rely on the "reason to exception" decision (or we can override it), and factor out the decision whether we can take the nice branch of `builtin_throw` so that we can bail out of intrinsic if we can't fast throw before we start setting up the intrinsic (that we would then need to undo). Does that match what you had in mind or you have another suggestion? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r1986999005 From duke at openjdk.org Mon Mar 10 10:26:06 2025 From: duke at openjdk.org (David Linus Briemann) Date: Mon, 10 Mar 2025 10:26:06 GMT Subject: Integrated: 8350325: [PPC64] ConvF2HFIdealizationTests timeouts on Power8 In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 12:02:01 GMT, David Linus Briemann wrote: > Skip ConvF2HFIdealizationTests for Power8 This pull request has now been integrated. Changeset: f61f520e Author: David Linus Briemann Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/f61f520e699e3eb5104c9467ec8269b837da74db Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8350325: [PPC64] ConvF2HFIdealizationTests timeouts on Power8 Reviewed-by: mdoerr, clanger ------------- PR: https://git.openjdk.org/jdk/pull/23692 From duke at openjdk.org Mon Mar 10 10:58:31 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Mon, 10 Mar 2025 10:58:31 GMT Subject: RFR: 8330469 : Removed or replaced (PrintOpto && VerifyLoopOptimizations) =?UTF-8?B?4oCm?= Message-ID: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> **Issue:** There are currently 9 occurrences where we guard printing code with PrintOpto && VerifyLoopOptimizations. This flag combo is never really used in practice. **Solution**: I analysed the 9 occurrence. In cases, where the flag `PrintOpto && VerifyLoopOptimizations` was followed by flag `TraceLoopOpts` with `else if` or `|| operator` I removed the former flag. In other cases, where `PrintOpto && VerifyLoopOptimizations` was the only flag, I was replaced with `TraceLoopOpts`. **Test Result**: Link to [GitHub Action](https://github.com/sarannat/jdk/actions/runs/13723071055) run on commit [91ecc51](https://github.com/sarannat/jdk/commit/91ecc5190ce31da94bded4de210136f337286e69) ------------- Commit messages: - 8330469 : Removed or replaced (PrintOpto && VerifyLoopOptimizations) with TraceLoopOpts Changes: https://git.openjdk.org/jdk/pull/23959/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23959&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330469 Stats: 17 lines in 3 files changed: 2 ins; 6 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23959.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23959/head:pull/23959 PR: https://git.openjdk.org/jdk/pull/23959 From epeter at openjdk.org Mon Mar 10 10:58:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 10:58:27 GMT Subject: RFR: 8350485: C2: factor out common code in Node::grow() and Node::out_grow() In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 09:08:41 GMT, Saranya Natarajan wrote: > Node:grow() and Node::out_grow() are copy-pasted from each other and their core logic could be factored out into a third function or at least cleaned up. Hence,the fix includes a function Node::array_resize() that implements the core logic of Node::grow() and Node::out_grow(). > > Link to Github action which had no failures : https://github.com/sarannat/jdk/actions/runs/13677508359 @sarannat I'm mostly leaving code style comments. I promise I won't be so pedantic in the future ;) It may be good for you to read through this, at least to have an overview: https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md src/hotspot/share/opto/node.cpp line 667: > 665: } > 666: > 667: //------------------------------resize_array------------------------------------------- Suggestion: Just FYI. You can have these headers, but it is not expected any more. So I usually leave them out. But up to you. src/hotspot/share/opto/node.cpp line 670: > 668: // Resize input or output array to grow it the next larger power-of-2 bigger > 669: // than len. > 670: void Node::resize_array( Node**& array, node_idx_t& max_size, uint len, bool is_in) { Suggestion: void Node::resize_array(Node**& array, node_idx_t& max_size, uint len, bool is_in) { Code style src/hotspot/share/opto/node.cpp line 673: > 671: Arena* arena = Compile::current()->node_arena(); > 672: uint new_max = max_size; > 673: if( new_max == 0 ) { Suggestion: if(new_max == 0) { While we are here we might as well fix this too. src/hotspot/share/opto/node.cpp line 675: > 673: if( new_max == 0 ) { > 674: max_size = 4; > 675: array = (Node**)arena->Amalloc(4*sizeof(Node*)); Suggestion: array = (Node**)arena->Amalloc(4 * sizeof(Node*)); We generally want to have spaces around operators. See: https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > Use spaces around operators, especially comparisons and assignments. (Relaxable for boolean expressions and high-precedence operators in classic math-style formulas.) src/hotspot/share/opto/node.cpp line 683: > 681: } > 682: return; > 683: } Suggestion: } return; } Indentation src/hotspot/share/opto/node.hpp line 340: > 338: // Resize input or output array to grow it the next larger power-of-2 bigger > 339: // than len. > 340: void resize_array(Node **&array, node_idx_t &max_size, uint len, bool is_in); Suggestion: void resize_array(Node**& array, node_idx_t &max_size, uint len, bool is_in); Nit code style ;) ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23928#pullrequestreview-2667282496 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1985077851 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1985076647 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1985078681 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1985080375 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1985080970 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1985075533 From thartmann at openjdk.org Mon Mar 10 10:58:27 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 10 Mar 2025 10:58:27 GMT Subject: RFR: 8350485: C2: factor out common code in Node::grow() and Node::out_grow() In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 09:08:41 GMT, Saranya Natarajan wrote: > Node:grow() and Node::out_grow() are copy-pasted from each other and their core logic could be factored out into a third function or at least cleaned up. Hence,the fix includes a function Node::array_resize() that implements the core logic of Node::grow() and Node::out_grow(). > > Link to Github action which had no failures : https://github.com/sarannat/jdk/actions/runs/13677508359 Congratulations on your first PR, Saranya! :partying_face: I also left a code style comment but looks good to me otherwise. src/hotspot/share/opto/node.cpp line 688: > 686: // Previously I was using only powers-of-2 which peaked at 128 edges. > 687: //if( new_max >= limit ) new_max = limit-1; > 688: if(!is_in){ Suggestion: if (!is_in) { Same below. ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23928#pullrequestreview-2667389492 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1985138147 From duke at openjdk.org Mon Mar 10 10:58:26 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Mon, 10 Mar 2025 10:58:26 GMT Subject: RFR: 8350485: C2: factor out common code in Node::grow() and Node::out_grow() Message-ID: Node:grow() and Node::out_grow() are copy-pasted from each other and their core logic could be factored out into a third function or at least cleaned up. Hence,the fix includes a function Node::array_resize() that implements the core logic of Node::grow() and Node::out_grow(). Link to Github action which had no failures : https://github.com/sarannat/jdk/actions/runs/13677508359 ------------- Commit messages: - JDK-8350485: Code formatting to array_resize - JDK-8350485:Code formatting and comments added to resize_array - JDK-8350485: Added resize_array Changes: https://git.openjdk.org/jdk/pull/23928/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23928&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350485 Stats: 51 lines in 2 files changed: 16 ins; 17 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/23928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23928/head:pull/23928 PR: https://git.openjdk.org/jdk/pull/23928 From epeter at openjdk.org Mon Mar 10 10:58:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 10:58:27 GMT Subject: RFR: 8350485: C2: factor out common code in Node::grow() and Node::out_grow() In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 13:36:58 GMT, Emanuel Peter wrote: >> Node:grow() and Node::out_grow() are copy-pasted from each other and their core logic could be factored out into a third function or at least cleaned up. Hence,the fix includes a function Node::array_resize() that implements the core logic of Node::grow() and Node::out_grow(). >> >> Link to Github action which had no failures : https://github.com/sarannat/jdk/actions/runs/13677508359 > > src/hotspot/share/opto/node.hpp line 340: > >> 338: // Resize input or output array to grow it the next larger power-of-2 bigger >> 339: // than len. >> 340: void resize_array(Node **&array, node_idx_t &max_size, uint len, bool is_in); > > Suggestion: > > void resize_array(Node**& array, node_idx_t &max_size, uint len, bool is_in); > > Nit code style ;) What does `is_in` mean here? Feel free to give it a longer more expressive name ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1985081817 From dnsimon at openjdk.org Mon Mar 10 11:06:00 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 10 Mar 2025 11:06:00 GMT Subject: RFR: 8346825: [JVMCI] Remove NativeImageReinitialize annotation In-Reply-To: References: Message-ID: On Sun, 9 Mar 2025 19:07:54 GMT, Doug Simon wrote: > The `jdk.vm.ci.common.NativeImageReinitialize` annotation was introduced to reset JVMCI and Graal fields to their default values as they are copied into the libgraal image. Now that class loader separation is used to isolate the JVMCI and Graal classes compiled to produce libgraal from the JVMCI and Graal classes being executed to do the AOT compilation, the need for this field resetting is no longer needed. This PR removes the `NativeImageReinitialize` annotation. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23957#issuecomment-2710203461 From dnsimon at openjdk.org Mon Mar 10 11:06:00 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 10 Mar 2025 11:06:00 GMT Subject: Integrated: 8346825: [JVMCI] Remove NativeImageReinitialize annotation In-Reply-To: References: Message-ID: On Sun, 9 Mar 2025 19:07:54 GMT, Doug Simon wrote: > The `jdk.vm.ci.common.NativeImageReinitialize` annotation was introduced to reset JVMCI and Graal fields to their default values as they are copied into the libgraal image. Now that class loader separation is used to isolate the JVMCI and Graal classes compiled to produce libgraal from the JVMCI and Graal classes being executed to do the AOT compilation, the need for this field resetting is no longer needed. This PR removes the `NativeImageReinitialize` annotation. This pull request has now been integrated. Changeset: 99547c5b Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/99547c5b254807580e0a5238b95d55d38181f4fc Stats: 69 lines in 10 files changed: 0 ins; 44 del; 25 mod 8346825: [JVMCI] Remove NativeImageReinitialize annotation Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/23957 From epeter at openjdk.org Mon Mar 10 12:04:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 12:04:29 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing Message-ID: With [JDK-8348959](https://bugs.openjdk.org/browse/JDK-8348959) we see that there can be some issues when RangeCheck smearing happens in the same IGVN phase as MergeStores. It means that some RangeChecks are still around as we do MergeStores, and then we cannot merge as many stores as we would like. We should ensure that RangeCheck smearing happens during post-loop-opts, and then MergeStores happens in a separate dedicated IGVN round afterwards. ------------- Commit messages: - actually do igvn and add print_method - clear flag after clone - more fix, and more comments - JDK-8351414 Changes: https://git.openjdk.org/jdk/pull/23944/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23944&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351414 Stats: 107 lines in 8 files changed: 103 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23944.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23944/head:pull/23944 PR: https://git.openjdk.org/jdk/pull/23944 From qamai at openjdk.org Mon Mar 10 12:04:29 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 10 Mar 2025 12:04:29 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 15:07:37 GMT, Emanuel Peter wrote: > With [JDK-8348959](https://bugs.openjdk.org/browse/JDK-8348959) we see that there can be some issues when RangeCheck smearing happens in the same IGVN phase as MergeStores. It means that some RangeChecks are still around as we do MergeStores, and then we cannot merge as many stores as we would like. We should ensure that RangeCheck smearing happens during post-loop-opts, and then MergeStores happens in a separate dedicated IGVN round afterwards. Do you think it would make sense to make a dedicated `PhaseMergeStores` instead? I would imagine `PhaseMergeStores` that looks at all stores in the graph, does the analysis and the transformation, then we can run another round of IGVN after that. I think a global view would be easier and more efficient than the local view from `StoreNode::Ideal`. Since `StoreNode::Ideal` needs to ensure that there is no store after it and just bail out otherwise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23944#issuecomment-2706781634 PR Comment: https://git.openjdk.org/jdk/pull/23944#issuecomment-2706834193 From epeter at openjdk.org Mon Mar 10 12:04:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 12:04:29 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 15:47:33 GMT, Quan Anh Mai wrote: > Do you think it would make sense to make a dedicated `PhaseMergeStores` instead? Hmm, maybe? Can you say a little more how you imagine it? @merykitty I basically just copied the structure from `post_loop_opts_phase`... > I would imagine `PhaseMergeStores` that looks at all stores in the graph, does the analysis and the transformation, then we can run another round of IGVN after that. I think a global view would be easier and more efficient than the local view from `StoreNode::Ideal`. Since `StoreNode::Ideal` needs to ensure that there is no store after it and just bail out otherwise. @merykitty I'm not sure if I understood you right. You seem to say that I should do this (please correct me): - Traverse the WHOLE graph, looking for stores (I would like to avoid that, that's why I carry around the list). - Rewrite the logic in MergeStores, and somehow have a global view rather than the local one. That's lots of work and I'm not sure I want to invest the time, unless it is somehow clearly better. This is really a fix for Valhalla, see https://bugs.openjdk.org/browse/JDK-8348959, so it would be nice to fix this rather sooner. If someone wants to then refactor the code later, that's fine with me ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23944#issuecomment-2706816766 PR Comment: https://git.openjdk.org/jdk/pull/23944#issuecomment-2706829601 PR Comment: https://git.openjdk.org/jdk/pull/23944#issuecomment-2706858419 From qamai at openjdk.org Mon Mar 10 12:04:29 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 10 Mar 2025 12:04:29 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 16:13:01 GMT, Emanuel Peter wrote: >> I would imagine `PhaseMergeStores` that looks at all stores in the graph, does the analysis and the transformation, then we can run another round of IGVN after that. I think a global view would be easier and more efficient than the local view from `StoreNode::Ideal`. Since `StoreNode::Ideal` needs to ensure that there is no store after it and just bail out otherwise. > > @merykitty I'm not sure if I understood you right. You seem to say that I should do this (please correct me): > - Traverse the WHOLE graph, looking for stores (I would like to avoid that, that's why I carry around the list). > - Rewrite the logic in MergeStores, and somehow have a global view rather than the local one. That's lots of work and I'm not sure I want to invest the time, unless it is somehow clearly better. > > This is really a fix for Valhalla, see https://bugs.openjdk.org/browse/JDK-8348959, so it would be nice to fix this rather sooner. If someone wants to then refactor the code later, that's fine with me ;) @eme64 > Traverse the WHOLE graph, looking for stores (I would like to avoid that, that's why I carry around the list). I think keeping a list is fine. > Rewrite the logic in MergeStores, and somehow have a global view rather than the local one. That's lots of work and I'm not sure I want to invest the time, unless it is somehow clearly better. I think the logic would still be the same. We start at a store then try to find the last store in the chain, then group the stores and do the merge. After that, we can remove the replaced stores from the work list. It is global in the sense that we can freely walk the graph instead of being restricted to the current node that is invoking `Ideal`. This leads to a series of: Status status_use = find_adjacent_use_store(_store); if (status_use.found_store() != nullptr) { return nullptr; } while we can do while (next != nullptr) { StoreNode* last = next; next = find_adjacent_use_store(last); } It is also better because idealisation of a store node may be invoked several times, leading to useless `find_adjacent_use_store` invocations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23944#issuecomment-2706891522 From duke at openjdk.org Mon Mar 10 12:04:29 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 10 Mar 2025 12:04:29 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 16:13:01 GMT, Emanuel Peter wrote: >> I would imagine `PhaseMergeStores` that looks at all stores in the graph, does the analysis and the transformation, then we can run another round of IGVN after that. I think a global view would be easier and more efficient than the local view from `StoreNode::Ideal`. Since `StoreNode::Ideal` needs to ensure that there is no store after it and just bail out otherwise. > > @merykitty I'm not sure if I understood you right. You seem to say that I should do this (please correct me): > - Traverse the WHOLE graph, looking for stores (I would like to avoid that, that's why I carry around the list). > - Rewrite the logic in MergeStores, and somehow have a global view rather than the local one. That's lots of work and I'm not sure I want to invest the time, unless it is somehow clearly better. > > This is really a fix for Valhalla, see https://bugs.openjdk.org/browse/JDK-8348959, so it would be nice to fix this rather sooner. If someone wants to then refactor the code later, that's fine with me ;) @eme64 I'm working on merging loads and I meet the same problem. I work around it by delay transform LoadNode after all range check smearing. But I think your solution is better. Could you change the name "_for_merge_stores_igvn" as "_for_merge_mem_igvn" and it can be used both for store and loads? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23944#issuecomment-2709247260 From chagedorn at openjdk.org Mon Mar 10 12:04:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Mar 2025 12:04:53 GMT Subject: RFR: 8330469 : Removed or replaced (PrintOpto && VerifyLoopOptimizations) =?UTF-8?B?4oCm?= In-Reply-To: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> References: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> Message-ID: On Mon, 10 Mar 2025 10:34:19 GMT, Saranya Natarajan wrote: > **Issue:** There are currently 9 occurrences where we guard printing code with PrintOpto && VerifyLoopOptimizations. This flag combo is never really used in practice. > > **Solution**: I analysed the 9 occurrence. In cases, where the flag `PrintOpto && VerifyLoopOptimizations` was followed by flag `TraceLoopOpts` with `else if` or `|| operator` I removed the former flag. In other cases, where `PrintOpto && VerifyLoopOptimizations` was the only flag, I was replaced with `TraceLoopOpts`. > > **Test Result**: Link to [GitHub Action](https://github.com/sarannat/jdk/actions/runs/13723071055) run on commit [91ecc51](https://github.com/sarannat/jdk/commit/91ecc5190ce31da94bded4de210136f337286e69) Thanks for cleaning this up! I have a few suggestions. src/hotspot/share/opto/loopTransform.cpp line 756: > 754: if (TraceLoopOpts) { > 755: tty->print("Peeling a 'main' loop; resetting to 'normal' "); > 756: loop->dump_head(); I think you can remove `loop->dump_head()` since you already dump the head on L734. src/hotspot/share/opto/loopopts.cpp line 842: > 840: if (TraceLoopOpts) { > 841: tty->print_cr("CMOV"); > 842: } I think you can just remove this since we already print "CMOV" down on L866. src/hotspot/share/opto/loopopts.cpp line 851: > 849: if (m != nullptr && !is_dominator(get_ctrl(m), cmov_ctrl)) { > 850: #ifndef PRODUCT > 851: if (TraceLoopOpts) { I think it's too verbose for `TraceLoopOpts` which should only dump the high-level information. Since there is only one place where we dump additional information for this cmove optimization, I suggest to just drop this. If we want to trace the cmove optimization at some point, we might better introduce a "TraceConditionalMove" flag. src/hotspot/share/opto/split_if.cpp line 143: > 141: tty->print("Cloning up: "); > 142: n->dump(); > 143: } Same here with this print and the other places in `clone_cmp_down()`: I think it's too verbose for `TraceLoopOpts`. Since Split-If is quite complex, I think it would make sense to add a "TraceSplitIf" flag to get more information about the optimization. It's probably out of scope of this bug, so we could do that in a separate RFE. For this PR, I suggest to just drop these printings and link this PR to the "TraceSplitIf" RFE in order to restore/update/improve these. ------------- PR Review: https://git.openjdk.org/jdk/pull/23959#pullrequestreview-2670661693 PR Review Comment: https://git.openjdk.org/jdk/pull/23959#discussion_r1987122635 PR Review Comment: https://git.openjdk.org/jdk/pull/23959#discussion_r1987125824 PR Review Comment: https://git.openjdk.org/jdk/pull/23959#discussion_r1987141524 PR Review Comment: https://git.openjdk.org/jdk/pull/23959#discussion_r1987147472 From epeter at openjdk.org Mon Mar 10 12:04:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 12:04:29 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 16:28:56 GMT, Quan Anh Mai wrote: >> @merykitty I'm not sure if I understood you right. You seem to say that I should do this (please correct me): >> - Traverse the WHOLE graph, looking for stores (I would like to avoid that, that's why I carry around the list). >> - Rewrite the logic in MergeStores, and somehow have a global view rather than the local one. That's lots of work and I'm not sure I want to invest the time, unless it is somehow clearly better. >> >> This is really a fix for Valhalla, see https://bugs.openjdk.org/browse/JDK-8348959, so it would be nice to fix this rather sooner. If someone wants to then refactor the code later, that's fine with me ;) > > @eme64 > >> Traverse the WHOLE graph, looking for stores (I would like to avoid that, that's why I carry around the list). > > I think keeping a list is fine. > >> Rewrite the logic in MergeStores, and somehow have a global view rather than the local one. That's lots of work and I'm not sure I want to invest the time, unless it is somehow clearly better. > > I think the logic would still be the same. We start at a store then try to find the last store in the chain, then group the stores and do the merge. After that, we can remove the replaced stores from the work list. It is global in the sense that we can freely walk the graph instead of being restricted to the current node that is invoking `Ideal`. This leads to a series of: > > Status status_use = find_adjacent_use_store(_store); > if (status_use.found_store() != nullptr) { > return nullptr; > } > > while we can do > > while (next != nullptr) { > StoreNode* last = next; > next = find_adjacent_use_store(last); > } > > It is also better because idealisation of a store node may be invoked several times, leading to useless `find_adjacent_use_store` invocations. @merykitty Thanks for giving more details. I agree that your idea would lead to some fewer adjacency checks, and so it would be somewhat desirable to do that. However, splitting out the `merge_stores` list would have to be done anyway, and that is almost all of the code I have here, so this here is a step in the right direction. Using IGVN is fine in my view, especially because I'm not newly introducing it here, but just moving it. I personally have decided to only put minimal effort into MergeStores, my priorities are elsewhere. This issue here is primarily addressing the problems in the Valhalla repo, where the issues with the order of RC smearing and MergeStores is somehow more apparent than on mainline. So if you feel that this idea is important to you, feel free to file an RFE, maybe someone else wants to take this effort on. @kuaiwei I think for now I'll leave it with `merge_stores`, but once you implement the `merge_loads`, you can just rename it. I'm not sure yet what would be the best name. `merge_mem` reminds me too much of the `MergeMemNode`. Maybe `merge_loads_and_stores` or `merge_memops` could be a better alternative. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23944#issuecomment-2710163924 From thartmann at openjdk.org Mon Mar 10 12:06:01 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 10 Mar 2025 12:06:01 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v14] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 02:45:55 GMT, kuaiwei wrote: >> This patch enhance MergeStores optimization to support merge value with reverse byte order. >> >> Below is benchmark result before and after the patch: >> >> On aliyun g8y (aarch64) >> |name | before | score2 | ratio | >> |---|---|---|---| >> |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| >> |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| >> |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| >> |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| >> |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| >> |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| >> |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| >> |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| >> |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| >> |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| >> |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| >> |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| >> |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| >> |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| >> |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| >> |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| >> |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| >> |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| >> |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| >> |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| >> |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| >> |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| >> |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| >> |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| >> |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| >> |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| >> |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| >> |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| >> |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| >> >> AMD EPYC 9T24 >> ... > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Update riscv comments Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23030#pullrequestreview-2670714749 From chagedorn at openjdk.org Mon Mar 10 12:12:03 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Mar 2025 12:12:03 GMT Subject: RFR: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly [v4] In-Reply-To: References: Message-ID: <9EKfE8c1GVcXPvMSFRTL1LmfiPgahsEmc3-OpiZgf0s=.5c7823df-4b3f-45a9-8ae2-407b21c3c177@github.com> On Fri, 7 Mar 2025 11:09:07 GMT, Christian Hagedorn wrote: >> This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). >> >> ### Update Assertion Predicate Killing Mechanism >> The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. >> >> #### Mark Opaque*AssertionPredicate` Nodes Useless >> Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: >> - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. >> - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. >> - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. >> - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. >> >> #### Update Predicate Iteration Code >> To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: >> - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. >> >> #### Other Updates >> I've also applied some small refactorings of touched code. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update mark_useless for OpaqueMultiversioning Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23941#issuecomment-2710370411 From chagedorn at openjdk.org Mon Mar 10 12:12:03 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Mar 2025 12:12:03 GMT Subject: Integrated: 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 08:01:13 GMT, Christian Hagedorn wrote: > This patch is a preparatory patch for fixing https://github.com/openjdk/jdk/pull/23823 (already out for review) without relying on predicate matching during IGVN (currently proposed matching with IGVN but after looking into it and further discussions with @rwestrel, we think it's best to do it outside IGVN). > > ### Update Assertion Predicate Killing Mechanism > The main contribution of this patch is to update the killing mechanism of Assertion Predicates. Currently, we kill an Assertion Predicate by replacing its `Opaque*AssertionPredicate` node with `ConI [int:1]`. This creates problems in our predicate matching code (see for example, [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637)). To fix this and prepare for https://github.com/openjdk/jdk/pull/23823, we move to a different approach. > > #### Mark Opaque*AssertionPredicate` Nodes Useless > Instead of directly inserting constants, we mark `Opaque*AssertionPredicate` nodes useless when we find that an Assertion Predicate should be removed. In the next IGVN phase, we are folding it by taking the success path. This requires the following updates: > - Introduce `_useless` flags at `Opaque*AssertionPredicate` nodes and associated mark methods. > - Check the flags in `Value()` and return `TypeInt::ONE` if we find them useless. > - Adding `cmp()` method for `OpaqueInitializedAssertionPredicate` to avoid commoning up a useless with a useful node. > - `OpaqueTemplateAssertionPredicate` nodes are by design unique to the Template Assertion Predicate because they have non-hashable `OpaqueLoop*` nodes on their input paths. I still added `hash()` that returns `NO_HASH` as an additional guarantee/contract. > > #### Update Predicate Iteration Code > To make this new approach work, we need to update the predicate iteration code. We still match Assertion Predicates with an `OpaqueAssertion*Node` to skip over them but we skip to call the `visit()` method since there should not be any work to be done for killed Assertion Predicates. This includes the additional change: > - Refactoring `RegularPredicateBlockIterator::for_each()` which became bigger. I extracted some methods and decided to use a field instead of a local variable. This meant that I also had to remove `const` from the `for_each()` methods which I think is fine for a method that does iterations and needs to do some book keeping. > > #### Other Updates > I've also applied some small refactorings of touched code. > > Thanks, > Christian This pull request has now been integrated. Changeset: 4867a4c8 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/4867a4c89e99e3ba7fdd9f44e926c82216804167 Stats: 219 lines in 6 files changed: 140 ins; 25 del; 54 mod 8351280: Mark Assertion Predicates useless instead of replacing them by a constant directly Reviewed-by: epeter, roland ------------- PR: https://git.openjdk.org/jdk/pull/23941 From duke at openjdk.org Mon Mar 10 12:51:53 2025 From: duke at openjdk.org (David Linus Briemann) Date: Mon, 10 Mar 2025 12:51:53 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C In-Reply-To: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: On Thu, 27 Feb 2025 14:30:42 GMT, David Linus Briemann wrote: > 8350866: [x86] Add C1 intrinsics for CRC32-C Local benchmarks show good improvements for the crc32c intrinsification: without intrinsic (master): $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 offset = 0 msgSize = 5120 bytes iters = 300000 ------------------------------------------------------- CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 CRC32C.update(byte[]) runtime = 1.186507782 seconds CRC32C.update(byte[]) throughput = 1294.5553525244388 MB/s CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 ------------------------------------------------------- CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 CRC32C.update(ByteBuffer) runtime = 1.355515648 seconds CRC32C.update(ByteBuffer) throughput = 1133.1481139788364 MB/s CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 ------------------------------------------------------- with intrinsic: $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 offset = 0 msgSize = 5120 bytes iters = 300000 ------------------------------------------------------- CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 CRC32C.update(byte[]) runtime = 0.065003188 seconds CRC32C.update(byte[]) throughput = 23629.610289267657 MB/s CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 ------------------------------------------------------- CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 CRC32C.update(ByteBuffer) runtime = 0.072310133 seconds CRC32C.update(ByteBuffer) throughput = 21241.836189127185 MB/s CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 ------------------------------------------------------- ------------- PR Comment: https://git.openjdk.org/jdk/pull/23826#issuecomment-2710476783 From duke at openjdk.org Mon Mar 10 12:55:35 2025 From: duke at openjdk.org (David Linus Briemann) Date: Mon, 10 Mar 2025 12:55:35 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C [v2] In-Reply-To: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: > 8350866: [x86] Add C1 intrinsics for CRC32-C David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23826/files - new: https://git.openjdk.org/jdk/pull/23826/files/2bd8d6f7..788202cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23826&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23826&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23826.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23826/head:pull/23826 PR: https://git.openjdk.org/jdk/pull/23826 From duke at openjdk.org Mon Mar 10 12:58:39 2025 From: duke at openjdk.org (David Linus Briemann) Date: Mon, 10 Mar 2025 12:58:39 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C [v3] In-Reply-To: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: > 8350866: [x86] Add C1 intrinsics for CRC32-C David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: fix typo again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23826/files - new: https://git.openjdk.org/jdk/pull/23826/files/788202cb..0b93d006 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23826&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23826&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23826.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23826/head:pull/23826 PR: https://git.openjdk.org/jdk/pull/23826 From qamai at openjdk.org Mon Mar 10 13:04:02 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 10 Mar 2025 13:04:02 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 15:07:37 GMT, Emanuel Peter wrote: > With [JDK-8348959](https://bugs.openjdk.org/browse/JDK-8348959) we see that there can be some issues when RangeCheck smearing happens in the same IGVN phase as MergeStores. It means that some RangeChecks are still around as we do MergeStores, and then we cannot merge as many stores as we would like. We should ensure that RangeCheck smearing happens during post-loop-opts, and then MergeStores happens in a separate dedicated IGVN round afterwards. Thanks for the elaboration. LGTM. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/23944#pullrequestreview-2670878613 From roland at openjdk.org Mon Mar 10 13:32:40 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 10 Mar 2025 13:32:40 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v11] In-Reply-To: References: Message-ID: > This change refactors `RShiftI`/`RshiftL` `Ideal`, `Identity` and > `Value` because the `int` and `long` versions are very similar and so > there's no logic duplication. In the process, support for some extra > transformations is added to `RShiftL`. I also added some new test > cases. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - review + test fix - review - Merge branch 'master' into JDK-8349361 - review - Merge branch 'master' into JDK-8349361 - review - review - review - Merge branch 'master' into JDK-8349361 - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Emanuel Peter - ... and 10 more: https://git.openjdk.org/jdk/compare/1f934b14...34e925b3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23438/files - new: https://git.openjdk.org/jdk/pull/23438/files/d3b1cf08..34e925b3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23438&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23438&range=09-10 Stats: 43133 lines in 1281 files changed: 19838 ins; 16965 del; 6330 mod Patch: https://git.openjdk.org/jdk/pull/23438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23438/head:pull/23438 PR: https://git.openjdk.org/jdk/pull/23438 From roland at openjdk.org Mon Mar 10 13:32:40 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 10 Mar 2025 13:32:40 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v10] In-Reply-To: References: <6P25Yy-0rkWudVp20tNwD1bWeozNUD0UoPdDlJIN7wc=.b07e7461-7af0-4fab-aa8b-a737b0b40591@github.com> Message-ID: On Tue, 4 Mar 2025 08:12:16 GMT, Emanuel Peter wrote: > I'd still like to run another round of testing once we have it all finished up though, so please ping me again once you worked through my comments ;) @eme64 new commit should address all your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23438#issuecomment-2710598569 From mbaesken at openjdk.org Mon Mar 10 13:43:27 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 10 Mar 2025 13:43:27 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' Message-ID: When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test java/foreign/TestHandshake this error/warning is reported : jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' UndefinedBehaviorSanitizer:DEADLYSIGNAL UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); ------------- Commit messages: - JDK-8346888 Changes: https://git.openjdk.org/jdk/pull/23962/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23962&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346888 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23962.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23962/head:pull/23962 PR: https://git.openjdk.org/jdk/pull/23962 From mdoerr at openjdk.org Mon Mar 10 13:43:11 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Mar 2025 13:43:11 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C [v3] In-Reply-To: References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: On Mon, 10 Mar 2025 12:58:39 GMT, David Linus Briemann wrote: >> 8350866: [x86] Add C1 intrinsics for CRC32-C > > David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: > > fix typo again Looks correct. Thanks for the contribution! src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 1172: > 1170: __ convert(Bytecodes::_i2l, index, tmp); > 1171: index = tmp; > 1172: __ add(index, LIR_OprFact::intptrConst(offset), index); I don't think we need this addition. LIR_Address can take both on x86: https://github.com/openjdk/jdk/blob/e90b6bdb875315de6b962e2c7d36606d9a593eb9/src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp#L1104 src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 1193: > 1191: __ move(len, arg3); > 1192: > 1193: __ call_runtime_leaf(StubRoutines::updateBytesCRC32C(), LIR_OprFact::illegalOpr, result_reg, cc->args()); Seems like x86 typically uses `getThreadTemp()` instead of `LIR_OprFact::illegalOpr`. ------------- PR Review: https://git.openjdk.org/jdk/pull/23826#pullrequestreview-2670980596 PR Review Comment: https://git.openjdk.org/jdk/pull/23826#discussion_r1987299953 PR Review Comment: https://git.openjdk.org/jdk/pull/23826#discussion_r1987301723 From chagedorn at openjdk.org Mon Mar 10 13:47:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Mar 2025 13:47:58 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 15:07:37 GMT, Emanuel Peter wrote: > With [JDK-8348959](https://bugs.openjdk.org/browse/JDK-8348959) we see that there can be some issues when RangeCheck smearing happens in the same IGVN phase as MergeStores. It means that some RangeChecks are still around as we do MergeStores, and then we cannot merge as many stores as we would like. We should ensure that RangeCheck smearing happens during post-loop-opts, and then MergeStores happens in a separate dedicated IGVN round afterwards. Looks good! src/hotspot/share/opto/compile.cpp line 1904: > 1902: // StoreI [ StoreL ] StoreI > 1903: // But now it would have been better to do this instead: > 1904: // [ StoreL ] [ StoreL ] Maybe you also can add a note here that RC smearing is not limited to just this one IGVN phase but that it's done in all subsequent IGVN phases (since we don't unset `_merge_stores_phase`). ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23944#pullrequestreview-2671009124 PR Review Comment: https://git.openjdk.org/jdk/pull/23944#discussion_r1987316597 From duke at openjdk.org Mon Mar 10 14:06:36 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Mon, 10 Mar 2025 14:06:36 GMT Subject: RFR: 8350485: C2: factor out common code in Node::grow() and Node::out_grow() [v2] In-Reply-To: References: Message-ID: > Node:grow() and Node::out_grow() are copy-pasted from each other and their core logic could be factored out into a third function or at least cleaned up. Hence,the fix includes a function Node::array_resize() that implements the core logic of Node::grow() and Node::out_grow(). > > Link to Github action which had no failures : https://github.com/sarannat/jdk/actions/runs/13677508359 Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: 8350485: Addressing review comments on code style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23928/files - new: https://git.openjdk.org/jdk/pull/23928/files/017fc161..f308762b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23928&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23928&range=00-01 Stats: 12 lines in 2 files changed: 0 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/23928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23928/head:pull/23928 PR: https://git.openjdk.org/jdk/pull/23928 From epeter at openjdk.org Mon Mar 10 15:00:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 15:00:48 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing [v2] In-Reply-To: References: Message-ID: <1f0hsdcFse85AAgiDu8X7jvGZoesIjLJZ1GTofgJxWo=.5c2e2f1e-66f5-4597-9602-41fa426b3e48@github.com> > With [JDK-8348959](https://bugs.openjdk.org/browse/JDK-8348959) we see that there can be some issues when RangeCheck smearing happens in the same IGVN phase as MergeStores. It means that some RangeChecks are still around as we do MergeStores, and then we cannot merge as many stores as we would like. We should ensure that RangeCheck smearing happens during post-loop-opts, and then MergeStores happens in a separate dedicated IGVN round afterwards. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: For Christian, add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23944/files - new: https://git.openjdk.org/jdk/pull/23944/files/4667e4bc..41292a65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23944&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23944&range=00-01 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23944.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23944/head:pull/23944 PR: https://git.openjdk.org/jdk/pull/23944 From epeter at openjdk.org Mon Mar 10 15:00:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 15:00:49 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing [v2] In-Reply-To: <1f0hsdcFse85AAgiDu8X7jvGZoesIjLJZ1GTofgJxWo=.5c2e2f1e-66f5-4597-9602-41fa426b3e48@github.com> References: <1f0hsdcFse85AAgiDu8X7jvGZoesIjLJZ1GTofgJxWo=.5c2e2f1e-66f5-4597-9602-41fa426b3e48@github.com> Message-ID: On Mon, 10 Mar 2025 14:57:45 GMT, Emanuel Peter wrote: >> With [JDK-8348959](https://bugs.openjdk.org/browse/JDK-8348959) we see that there can be some issues when RangeCheck smearing happens in the same IGVN phase as MergeStores. It means that some RangeChecks are still around as we do MergeStores, and then we cannot merge as many stores as we would like. We should ensure that RangeCheck smearing happens during post-loop-opts, and then MergeStores happens in a separate dedicated IGVN round afterwards. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > For Christian, add comment src/hotspot/share/opto/compile.cpp line 1904: > 1902: // StoreI [ StoreL ] StoreI > 1903: // But now it would have been better to do this instead: > 1904: // [ StoreL ] [ StoreL ] Suggestion: // [ StoreL ] [ StoreL ] // // Note: we allow stores to merge in this dedicated IGVN round, and any later IGVN round, // since we never unset _merge_stores_phase. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23944#discussion_r1987469946 From epeter at openjdk.org Mon Mar 10 15:00:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 15:00:50 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 13:44:14 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> For Christian, add comment > > src/hotspot/share/opto/compile.cpp line 1904: > >> 1902: // StoreI [ StoreL ] StoreI >> 1903: // But now it would have been better to do this instead: >> 1904: // [ StoreL ] [ StoreL ] > > Maybe you also can add a note here that RC smearing is not limited to just this one IGVN phase but that it's done in all subsequent IGVN phases (since we don't unset `_merge_stores_phase`). I added a comment below :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23944#discussion_r1987471099 From chagedorn at openjdk.org Mon Mar 10 15:12:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Mar 2025 15:12:53 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing [v2] In-Reply-To: <1f0hsdcFse85AAgiDu8X7jvGZoesIjLJZ1GTofgJxWo=.5c2e2f1e-66f5-4597-9602-41fa426b3e48@github.com> References: <1f0hsdcFse85AAgiDu8X7jvGZoesIjLJZ1GTofgJxWo=.5c2e2f1e-66f5-4597-9602-41fa426b3e48@github.com> Message-ID: On Mon, 10 Mar 2025 15:00:48 GMT, Emanuel Peter wrote: >> With [JDK-8348959](https://bugs.openjdk.org/browse/JDK-8348959) we see that there can be some issues when RangeCheck smearing happens in the same IGVN phase as MergeStores. It means that some RangeChecks are still around as we do MergeStores, and then we cannot merge as many stores as we would like. We should ensure that RangeCheck smearing happens during post-loop-opts, and then MergeStores happens in a separate dedicated IGVN round afterwards. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > For Christian, add comment Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23944#pullrequestreview-2671335086 From sparasa at openjdk.org Mon Mar 10 15:29:35 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 10 Mar 2025 15:29:35 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v7] In-Reply-To: References: Message-ID: <-js5hfCEa4-Q-zrgDHfbpUezOEilZWVapwyh9Pndkuw=.83f142c3-26c9-4fe5-a3d1-44e8f989bf95@github.com> > The goal of this PR is to generate code using APX NDD instructions. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: ndd version of cmov eq/ne ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/be13918d..630acb8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=05-06 Stats: 91 lines in 1 file changed: 85 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From epeter at openjdk.org Mon Mar 10 16:03:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 16:03:08 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v5] In-Reply-To: References: <0-rhfBkyIO8uh5uioQ4XDoEWGwRdfzah8GJrYvILeDM=.699a73a4-ffd2-42f1-b7df-4c32235b3218@github.com> Message-ID: On Fri, 7 Mar 2025 16:36:51 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/mulnode.cpp line 983: >> >>> 981: static Node* collapse_nested_shift_left(PhaseGVN* phase, Node* outer_shift, int con0, BasicType bt) { >>> 982: assert(bt == T_LONG || bt == T_INT, "Unexpected type"); >>> 983: int nbits = bt == T_LONG ? BitsPerJavaLong : BitsPerJavaInteger; >> >> Roland is introducing a new method for this in `https://github.com/openjdk/jdk/pull/23438`, see `bits_per_java_integer`. I suggest you use it too ;) > > Happily, as soon as this other PR is merged! Or you just copy it and hope that the merge will eventually be clean, up to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987585960 From epeter at openjdk.org Mon Mar 10 16:03:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 16:03:06 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v9] In-Reply-To: <-ea_PqhfAG9kdSWC_MsAKeSjgfUazBWr9QB1f_gJUdo=.170e7391-65c3-4aa6-b2f4-779182997156@github.com> References: <-ea_PqhfAG9kdSWC_MsAKeSjgfUazBWr9QB1f_gJUdo=.170e7391-65c3-4aa6-b2f4-779182997156@github.com> Message-ID: On Fri, 7 Mar 2025 16:11:36 GMT, Marc Chevalier wrote: >> This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. >> >> Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. >> >> This also works for multiplications by powers of 2 since they are already translated into shifts. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Random testing, trying... @marc-chevalier It looks better already! I am wondering if the case-distinction could not be made more explicit. You could first make a case distinction over `conIL ? conIR`, for 1) `=`, 2) `>` and 3) `<`. And then inside 2) further a case distinction over `conIR ? num_rejected_bits`. This would help the reader put the cases together, and make sure that really all cases are covered. What do you think? src/hotspot/share/opto/memnode.cpp line 3529: > 3527: // This case happens when the right-hand side of the store was itself a left shift, that gets merged > 3528: // into the inner left shift of the sign-extension. For instance, if we have > 3529: // array_of_shorts[0] = (short)(X << 2) Nit: The width of the text is a bit inconsistent. It's just a slight visual irritation ;) `right-hand side of the store` I think we usually call this the `value`, as in `store->in(MemNode::ValueIn)`. src/hotspot/share/opto/memnode.cpp line 3536: > 3534: // It is thus useful to handle the case where conIL > conIR. > 3535: // > 3536: // Let's assume we have the following 32 bits integer that we want to stuff in 8 bits char: `char` actually is `16` bits unsigned in Java ;) src/hotspot/share/opto/memnode.cpp line 3541: > 3539: // +------------------------+---------+ > 3540: // 31 8 7 0 > 3541: // v[0..7] is meaningful, but v[8..31] is not. In this case, num_rejected_bits == 24. Would this example not be nice with the original case above? // Check for useless sign-extension before a partial-word store // (StoreB ... (RShiftI _ (LShiftI _ valIn conIL ) conIR) ) // If (conIL == conIR && conIR <= num_bits) this simplifies to // (StoreB ... (valIn) ) Because it seems you are assuming here that `conIL == conIR`, right? And then below you ask what if they are not equal. src/hotspot/share/opto/memnode.cpp line 3543: > 3541: // v[0..7] is meaningful, but v[8..31] is not. In this case, num_rejected_bits == 24. > 3542: // Let's study what happens in different cases to see that the simplification into > 3543: // (StoreB ... (LShiftI _ valIn (conIL - conIR)) ) Here you use `valIn`, above sometimes also `X`. Would be nice if it was consistent ;) src/hotspot/share/opto/memnode.cpp line 3546: > 3544: // is valid if: > 3545: // - conIL >= conIR > 3546: // - conIR <= num_rejected_bits Is there also a restriction on `conIL`? src/hotspot/share/opto/memnode.cpp line 3549: > 3547: // Let's also remember that conIL < 32 since (x << 33) is simplified into (x << 1) > 3548: // and (x << 31) << 2 is simplified into 0. This means that in any case, after the > 3549: // left shift, we always have at least one bit of the original v. What does `original v` refer to? Is this the same as the `X` and the `valIn`? src/hotspot/share/opto/memnode.cpp line 3577: > 3575: // +------------------+---------+-----+ > 3576: // 31 8 7 2 1 0 > 3577: // The non-rejected bits are the 8 lower one of (v << conIL - conIR). Suggestion: // The non-rejected bits are the 8 lower ones of (v << conIL - conIR). src/hotspot/share/opto/memnode.cpp line 3579: > 3577: // The non-rejected bits are the 8 lower one of (v << conIL - conIR). > 3578: // The bits 6 and 7 of v have been thrown away after the shift left. > 3579: // The simplification is still fine. Suggestion for everywhere: `fine` -> `valid`. src/hotspot/share/opto/memnode.cpp line 3581: > 3579: // The simplification is still fine. > 3580: // > 3581: // ### Case 3: conIL > conIR < num_rejected_bits. Suggestion: // ### Case 3: conIL > conIR Or do you need that? And if so, do we only have `conIR < num_rejected_bits`, or also `conIL < num_rejected_bits`. A combination of `>` and `<` in the same equation can be a little confusing ;) src/hotspot/share/opto/memnode.cpp line 3593: > 3591: // +------------------+---------+-----+ > 3592: // 31 10 9 4 3 0 > 3593: // The non-rejected bits are the 8 lower one of (v << conIL - conIR). Suggestion: // The non-rejected bits are the 8 lower ones of (v << conIL - conIR). But it seems we only actually kept 6 of them? src/hotspot/share/opto/memnode.cpp line 3595: > 3593: // The non-rejected bits are the 8 lower one of (v << conIL - conIR). > 3594: // The bits 6 and 7 of v have been thrown away after the shift left. > 3595: // The bits 4 and 5 of v are still present, but outside of the kept bits (the 8 lower ones). I have trouble understanding this line. Bits 0-5 are still present. What do you mean by "outside of the kept bits"? src/hotspot/share/opto/memnode.cpp line 3621: > 3619: // Valid if conIL >= conIR <= num_rejected_bits > 3620: // > 3621: // We do not treat the case conIR > conIL here since the point of this function is Suggestion: // We do not treat the case conIL < conIR here since the point of this function is Nit: just to keep the two values on the sides we have used up to now. src/hotspot/share/opto/memnode.cpp line 3632: > 3630: if (shr->Opcode() == Op_RShiftI) { > 3631: const TypeInt* conIR = phase->type(shr->in(2))->isa_int(); > 3632: if (conIR != nullptr && conIR->is_con() && conIR->get_con() <= num_rejected_bits) { How do we know that `conIR` (and also `conIL`) are not negative? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23728#pullrequestreview-2671375553 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987520014 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987527427 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987538108 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987540449 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987575192 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987543065 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987554826 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987555725 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987561323 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987562830 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987567798 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987579919 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987584042 From epeter at openjdk.org Mon Mar 10 16:03:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Mar 2025 16:03:07 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v9] In-Reply-To: References: <-ea_PqhfAG9kdSWC_MsAKeSjgfUazBWr9QB1f_gJUdo=.170e7391-65c3-4aa6-b2f4-779182997156@github.com> Message-ID: On Mon, 10 Mar 2025 15:31:29 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Random testing, trying... > > src/hotspot/share/opto/memnode.cpp line 3541: > >> 3539: // +------------------------+---------+ >> 3540: // 31 8 7 0 >> 3541: // v[0..7] is meaningful, but v[8..31] is not. In this case, num_rejected_bits == 24. > > Would this example not be nice with the original case above? > > // Check for useless sign-extension before a partial-word store > // (StoreB ... (RShiftI _ (LShiftI _ valIn conIL ) conIR) ) > // If (conIL == conIR && conIR <= num_bits) this simplifies to > // (StoreB ... (valIn) ) > > Because it seems you are assuming here that `conIL == conIR`, right? And then below you ask what if they are not equal. It could also be nice to introduce the `num_rejected_bits` somewhere. > src/hotspot/share/opto/memnode.cpp line 3579: > >> 3577: // The non-rejected bits are the 8 lower one of (v << conIL - conIR). >> 3578: // The bits 6 and 7 of v have been thrown away after the shift left. >> 3579: // The simplification is still fine. > > Suggestion for everywhere: `fine` -> `valid`. Or correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987552656 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1987555990 From dlunden at openjdk.org Mon Mar 10 16:20:14 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 10 Mar 2025 16:20:14 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v8] In-Reply-To: References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: On Tue, 4 Mar 2025 07:09:34 GMT, Daniel Lund?n wrote: >> When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. >> >> ### Changeset >> >> It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. >> >> To illustrate the idealization and how it resolves this issue, consider the example below. >> >> ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) >> >> `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. >> >> We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. >> >> The changeset consists of the following changes. >> - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. >> - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. >> - Add multiple new regression tests in `TestGCMLoadPlacement.java`. >> >> For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/ac... > > Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into insert-anti-dependences-8333393+igvn+pr > - Update missing copyright > - Change to GrowableArray > - Update after Christian's review > - Fix subtle bug introduced in previous update > - Update after review comments > - Remove test that no longer reproduces the issue > - First version Integrating this now. Thanks for the reviews and contributions everyone! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23691#issuecomment-2711131860 From dlunden at openjdk.org Mon Mar 10 16:20:14 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 10 Mar 2025 16:20:14 GMT Subject: Integrated: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges In-Reply-To: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> References: <2HzvnZfO23KmMBnTXVx1fi3xeOCjeGlHFVsTijaFK7c=.0d511c73-505b-4db1-8622-6e823a1e2f0a@github.com> Message-ID: <9JWlNsr50pHMnw2LyyhSz9YtjwmCO8JTxRgtvuxVzic=.e4e79a36-c5bf-4d97-a0e9-21fe316b3d0d@github.com> On Wed, 19 Feb 2025 09:52:41 GMT, Daniel Lund?n wrote: > When searching for load anti-dependences in GCM, the memory state for the load is sometimes represented not only by the memory node input of the load, but also other memory nodes. Because PhaseCFG::insert_anti_dependences searches for anti-dependences only from the load's memory input, it is, therefore, possible to sometimes overlook anti-dependences. The result is that loads are potentially scheduled too late, after stores that redefine the memory states of the loads. > > ### Changeset > > It is not yet clear why multiple nodes sometimes represent the memory state of a load, nor if this is expected. We can, however, resolve all the miscompiled test cases seen in this issue by improving the idealization of Phi nodes. Specifically, there is an idealization where we split Phis through input MergeMems, that we, prior to this changeset, applied too conservatively. > > To illustrate the idealization and how it resolves this issue, consider the example below. > > ![failure-graph-1](https://github.com/user-attachments/assets/ecbd204f-bdf0-49cb-a62e-8081d08cfe0c) > > `64 membar_release` is a critical anti-dependence for `183 loadI`. The anti-dependence search starts at the load's direct memory input, `107 Phi`, and stops immediately at Phis. Therefore, the search ends at `106 Phi` and we never find `64 membar_release`. > > We can apply the split-through-MergeMem Phi idealization to `119 Phi`. This idealization pushes `119 Phi` through `120 MergeMem` and `121 MergeMem`, splitting it into the individual inputs of the MergeMems in the process. As a result, we replace `119 Phi` with two new Phis. One of these generated Phis has identical inputs to `107 Phi` (`106 Phi` and `104 Phi`), and further idealizations will merge this new Phi and `107 Phi`. As a result, `107 Phi` then has a Phi-free path to `64 membar_release` and we correctly discover the anti-dependence. > > The changeset consists of the following changes. > - Add an analysis that allows applying the split-through-MergeMem idealization in more cases than before (including in the above example) while still ensuring termination. > - Add a missing `ResourceMark` in `PhiNode::split_out_instance`. > - Add multiple new regression tests in `TestGCMLoadPlacement.java`. > > For reference, [here](https://github.com/openjdk/jdk/pull/22852) is a previous PR with an alternative fix that we decided to discard in favor of the fix in this PR. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/13394882532) > - `tier1` to `tier4` (an... This pull request has now been integrated. Changeset: b40be225 Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/b40be22512a8d3b3350fef8d6668d80134a6f1a6 Stats: 366 lines in 3 files changed: 356 ins; 1 del; 9 mod 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges Co-authored-by: Roberto Casta?eda Lozano Co-authored-by: Christian Hagedorn Co-authored-by: Tobias Hartmann Co-authored-by: Emanuel Peter Co-authored-by: Quan Anh Mai Reviewed-by: rcastanedalo, chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/23691 From jiangli at openjdk.org Mon Mar 10 17:05:07 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 10 Mar 2025 17:05:07 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK In-Reply-To: <1UivDxf7iNhuhkTsh0S60VECAnkYkH4HQrYMmlCrZy0=.179c743b-4495-498c-b4ab-9fc0efce9467@github.com> References: <1UivDxf7iNhuhkTsh0S60VECAnkYkH4HQrYMmlCrZy0=.179c743b-4495-498c-b4ab-9fc0efce9467@github.com> Message-ID: On Tue, 4 Mar 2025 18:48:28 GMT, Alan Bateman wrote: >> Please review the `Arguments::parse_each_vm_init_arg` change to ignore`-server|-client` options, which avoids unrecognized option error on static JDK. >> >> On regular JDK, '-server|-client' options are processed/removed from command-line arguments by `CheckJvmType` during `CreateExecutionEnvironment`. That happens before `Arguments::parse_each_vm_init_arg` is called. With jvm.cfg setting, only server vm is known and client is ignored. So specifying '-server' and '-client' in command-line is really a no-op. >> >> On static JDK, the VM is statically linked with the launcher, and `CreateExecutionEnvironment` & `CheckJvmType` are not called. As the result, `Arguments::parse_each_vm_init_arg` could see `-server|-client` when running on static JDK, if the options are specified in the command line. > > Jiangli and I chatted about this today. We don't think there will be developers looking to specify -server or -client to a static image, instead this is more about the tests. So we think the best think is to look at the tests that still specify -server and see if it can be dropped. Some of the tests (say for C2) might be better off using `@requires vm.compiler2.enabled` or `@requires vm.flavor == "server"`. @AlanBateman @dholmes-ora @iklam Do you have any other comments/questions about the change? @vnkozlov or others from compiler side, can you please take a look of the change as well? Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/23881#issuecomment-2711261297 From mli at openjdk.org Mon Mar 10 18:02:33 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 10 Mar 2025 18:02:33 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this trivial patch? > Currently, it only reports that some flag is non-whitelisted, but does not print out the flag explicitly, but it's helpful to do so. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23931/files - new: https://git.openjdk.org/jdk/pull/23931/files/97a68e2e..62cc8dbc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23931&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23931&range=01-02 Stats: 18 lines in 1 file changed: 7 ins; 2 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23931.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23931/head:pull/23931 PR: https://git.openjdk.org/jdk/pull/23931 From mli at openjdk.org Mon Mar 10 18:02:33 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 10 Mar 2025 18:02:33 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 07:51:57 GMT, Christian Hagedorn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> refine > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 602: > >> 600: boolean debugTest, irTest, nonWhiteListedTest; >> 601: >> 602: debugTest = Platform.isDebugBuild() && !Platform.isInt() && !Platform.isComp(); > > I suggest to split this up into three separate checks which generate three separate messages. OK, fixed. > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 615: > >> 613: System.out.println("IR verification disabled due to:"); >> 614: if (!debugTest) { >> 615: System.out.println("\tnot running a debug build (required for PrintIdeal and PrintOptoAssembly), " + > > I like the improvements so far. But I think we should not use tabs since the size could be specific to the machine we are running on. I suggest to use `-` instead which could use the following structure: > > IR verification disabled due to the following reason(s): > - Reason 1 > - Reason 2 > ... > - Using non-whitelisted JTreg VM or Javaoptions flag(s): > - Non-whitelisted flag 1 > - Non-whitelisted flag 2 > - ... Make sense to me, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23931#discussion_r1987777527 PR Review Comment: https://git.openjdk.org/jdk/pull/23931#discussion_r1987778100 From alanb at openjdk.org Mon Mar 10 19:57:00 2025 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 10 Mar 2025 19:57:00 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK [v2] In-Reply-To: <2fluPJGNiu9SvOwq6MfyLch7lChTjPlOJh7dcrXxfa4=.55bdfd48-08dd-4f14-b683-88fccbc66e1e@github.com> References: <2fluPJGNiu9SvOwq6MfyLch7lChTjPlOJh7dcrXxfa4=.55bdfd48-08dd-4f14-b683-88fccbc66e1e@github.com> Message-ID: On Thu, 6 Mar 2025 02:18:43 GMT, Jiangli Zhou wrote: >> Please review the `Arguments::parse_each_vm_init_arg` change to ignore`-server|-client` options, which avoids unrecognized option error on static JDK. >> >> On regular JDK, '-server|-client' options are processed/removed from command-line arguments by `CheckJvmType` during `CreateExecutionEnvironment`. That happens before `Arguments::parse_each_vm_init_arg` is called. With jvm.cfg setting, only server vm is known and client is ignored. So specifying '-server' and '-client' in command-line is really a no-op. >> >> On static JDK, the VM is statically linked with the launcher, and `CreateExecutionEnvironment` & `CheckJvmType` are not called. As the result, `Arguments::parse_each_vm_init_arg` could see `-server|-client` when running on static JDK, if the options are specified in the command line. > > Jiangli Zhou has updated the pull request incrementally with two additional commits since the last revision: > > - Remove '-server' from all following tests. > > Add?@requires vm.flavor == "server" & !vm.emulatedClient since these tests run on c2: > - compiler/c2/TestReduceAllocationAndHeapDump.java > - compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java > > These tests already have @requires?vm.compiler2.enabled: > - compiler/c2/TestReduceAllocationAndLoadKlass.java > - compiler/c2/TestReduceAllocationAndNonExactAllocate.java > - compiler/c2/TestReduceAllocationAndNullableLoads.java > - compiler/c2/TestReduceAllocationAndPointerComparisons.java > - compiler/escapeAnalysis/TestIterativeEA.java > > Can run on c1/c2: > - compiler/escapeAnalysis/TestReduceAllocationAndNonReduciblePhi.java > > Already have @requires vm.flavor == "server": > - compiler/intrinsics/math/TestMinMaxIntrinsics.java > - compiler/profiling/TestTypeProfiling.java > - gc/stress/gcbasher/TestGCBasherWithG1.java > - gc/stress/gcbasher/TestGCBasherWithParallel.java > - gc/stress/gcbasher/TestGCBasherWithSerial.java > > Not compiler specific: > - runtime/CDSCompressedKPtrs/XShareAuto.java > - Revert src/hotspot/share/runtime/arguments.cpp. Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23881#pullrequestreview-2672048988 From alanb at openjdk.org Mon Mar 10 19:57:01 2025 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 10 Mar 2025 19:57:01 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK In-Reply-To: <1UivDxf7iNhuhkTsh0S60VECAnkYkH4HQrYMmlCrZy0=.179c743b-4495-498c-b4ab-9fc0efce9467@github.com> References: <1UivDxf7iNhuhkTsh0S60VECAnkYkH4HQrYMmlCrZy0=.179c743b-4495-498c-b4ab-9fc0efce9467@github.com> Message-ID: <7YqG7ryKpJEk6hBnKnTj3Y_O2p_9x7XfT0M6jNw7nQg=.c5b75d99-9ce0-4898-a174-3646cfc4402a@github.com> On Tue, 4 Mar 2025 18:48:28 GMT, Alan Bateman wrote: >> Please review the `Arguments::parse_each_vm_init_arg` change to ignore`-server|-client` options, which avoids unrecognized option error on static JDK. >> >> On regular JDK, '-server|-client' options are processed/removed from command-line arguments by `CheckJvmType` during `CreateExecutionEnvironment`. That happens before `Arguments::parse_each_vm_init_arg` is called. With jvm.cfg setting, only server vm is known and client is ignored. So specifying '-server' and '-client' in command-line is really a no-op. >> >> On static JDK, the VM is statically linked with the launcher, and `CreateExecutionEnvironment` & `CheckJvmType` are not called. As the result, `Arguments::parse_each_vm_init_arg` could see `-server|-client` when running on static JDK, if the options are specified in the command line. > > Jiangli and I chatted about this today. We don't think there will be developers looking to specify -server or -client to a static image, instead this is more about the tests. So we think the best think is to look at the tests that still specify -server and see if it can be dropped. Some of the tests (say for C2) might be better off using `@requires vm.compiler2.enabled` or `@requires vm.flavor == "server"`. > @AlanBateman @dholmes-ora @iklam Do you have any other comments/questions about the change? @vnkozlov or others from compiler side, can you please take a look of the change as well? Thanks I wasn't initially sure about XShareAuto.java but I see the exchange between you and Ioi so I think all good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23881#issuecomment-2711673676 From vlivanov at openjdk.org Mon Mar 10 20:51:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 10 Mar 2025 20:51:54 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v5] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 09:49:26 GMT, Damon Fenacci wrote: >> # Issue >> >> The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. >> >> # Cause >> >> The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. >> The graph that leads to the issue looks like this: >> ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) >> The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: >> ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) >> The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. >> The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. >> >> This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. >> >> # Solution >> >> Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. >> >> # Testing >> >> Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) >> >> Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - JDK-8302459: refactor helper method > - JDK-8302459: reshape infinite loop check Looks good! ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21682#pullrequestreview-2672168186 From sviswanathan at openjdk.org Mon Mar 10 21:26:41 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 10 Mar 2025 21:26:41 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: References: Message-ID: > Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. > > Review comments are welcome. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23939/files - new: https://git.openjdk.org/jdk/pull/23939/files/4ebb47a7..70ab0acc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23939&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23939&range=00-01 Stats: 35 lines in 1 file changed: 32 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23939.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23939/head:pull/23939 PR: https://git.openjdk.org/jdk/pull/23939 From sviswanathan at openjdk.org Mon Mar 10 21:26:41 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 10 Mar 2025 21:26:41 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: References: <1TIUz8O4xjCwMArQYWlPJ4qRR9SBVpux0cceH9m2X5k=.521532a4-9ea7-4031-aa98-a60ce2c8982a@github.com> Message-ID: On Mon, 10 Mar 2025 10:13:55 GMT, Emanuel Peter wrote: >> Thanks a lot @vnkozlov for the review and approval. > > @sviswa7 thanks for looking at this! The fix looks good, there are just a few comments about the test :) @eme64 @jatin-bhateja Your review comments are addressed, please take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23939#issuecomment-2711874113 From sviswanathan at openjdk.org Mon Mar 10 21:32:00 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 10 Mar 2025 21:32:00 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 08:56:17 GMT, Emanuel Peter wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 29: > >> 27: * @bug 8350835 >> 28: * @summary Test bug fix for JDK-8350835 discovered through Template Framework >> 29: * @requires vm.compiler2.enabled > > Suggestion: > > > Is this restriction necessary? I generally prefer running tests on all platforms, and only restricting IR rules. That way we can get more test coverage with result verification. The restriction is not necessary, removed. > test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 31: > >> 29: * @requires vm.compiler2.enabled >> 30: * @library /test/lib / >> 31: * @run main/othervm -XX:-TieredCompilation -XX:CompileOnly=compiler.vectorization.TestFloat16ToFloatConv::test* compiler.vectorization.TestFloat16ToFloatConv > > Are the additional flags really necessary for reproducing the bug? I would suspect not really. The IR framework already takes care of ensuring we run with C2 compilation. Removed the additional flags. > test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 47: > >> 45: private static int[] aI = new int[SIZE]; >> 46: private static long[] aL = new long[SIZE]; >> 47: private static float[] goldB, goldS, goldI, goldL; > > Are you testing for `char` as well? Added test for char. > test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 55: > >> 53: aI[i] = RANDOM.nextInt(); >> 54: aL[i] = RANDOM.nextLong(); >> 55: } > > I would prefer if we could start using `Generators`. There is a `fill` method for arrays. It generates more "interesting" values. It is not super relevant here, but it would be nice if we made this common practice now ;) The Generators don't support the bytes, shorts, and chars yet so ended up not using them for this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1988049345 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1988048883 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1988048633 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1988048278 From sviswanathan at openjdk.org Mon Mar 10 21:32:01 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 10 Mar 2025 21:32:01 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 07:51:24 GMT, Jatin Bhateja wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 89: > >> 87: } >> 88: >> 89: @Test > > ``` suggestion > /* > * C2 handles this in two steps: l2i handling creates ConvL2I IR ,followed by i2s conversion which onstrains the > * value range of the integral argument; thus, the argument fed to ConvHF2F is of type T_INT. Fix for > * JDK-8350835 skip over vectorizing such a case for now. > */ > @Test > @IR(failOn = { IRNode.VECTOR_CAST_HF2F }, applyIfCPUFeatureOr = { "avx512vl", "true", "f16c", "true" }) Thanks, added the generic failOn tests for byte, int, and long test cases as it is applicable across architectures. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1988052085 From sviswanathan at openjdk.org Mon Mar 10 21:34:54 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 10 Mar 2025 21:34:54 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: References: Message-ID: <7EI9O5thx8AyGis3mApSdjQv9ral_gTncOaybCykXuA=.672a0c2f-17a3-47af-a53b-66c7456b05eb@github.com> On Fri, 7 Mar 2025 07:48:41 GMT, Jatin Bhateja wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 71: > >> 69: } >> 70: >> 71: @Test > > Suggestion: > > @Test > @IR(counts = { IRNode.VECTOR_CAST_HF2F, " >0 " }, applyIfCPUFeatureOr = { "avx512vl", "true", "f16c", "true" }) Used the IR test from compiler/vectorization/TestFloatConversionsVector.java to include other architectures as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1988056379 From eastigeevich at openjdk.org Mon Mar 10 22:18:52 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 10 Mar 2025 22:18:52 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 22:23:23 GMT, Evgeny Astigeevich wrote: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... @dean-long > On Neoverse, what's the size of a region I don't find anything about this in Neoverse docs. Although in Arm Neoverse N1 Software Optimization Guide, 4.9 Branch instruction alignment, I found: > Branch instruction and branch target instruction alignment and density can affect performance. > For best-case performance, consider the following guidelines. > ... > - When possible, a branch and its target should be located within the same 2M aligned memory region. > > Consider aligning subroutine entry points and branch targets to 32B boundaries, within the bounds of the code-density requirements of the program. This is the place where I've got an idea of 2M used in the benchmark. > why must it split the code into separate regions at all? According to the Arm blog post, this is the front-end and related to the branch predictor. So I guess it helps to predict targets. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2711970096 From dlong at openjdk.org Mon Mar 10 23:34:52 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 10 Mar 2025 23:34:52 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 22:23:23 GMT, Evgeny Astigeevich wrote: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... Interesting. So there may be a "density" performance penalty even for short branch ranges if the source and destination are not in the same region. So for example a branch at 0x1ffff0 with a target of 0x200000 are not in the same region, even though the distance is only 0x10. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2712078132 From dlong at openjdk.org Mon Mar 10 23:45:57 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 10 Mar 2025 23:45:57 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Thu, 6 Mar 2025 12:15:52 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache. >> >> OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data). >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - swap matadata and jvmci data in outputs according to data layout > - cleanup > - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup > - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description > - add a separate adrp_movk function to to support targets located more than 4GB away > - Force the use of movk in combination with adrp and ldr instructions to address scenarios > where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp > - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: > _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. > Fix: use _oops_size int16 field to calculate metadata offset > - removing dead code > - a bit of cleanup and addressing review suggestions > - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup > - ... and 5 more: https://git.openjdk.org/jdk/compare/cfab88b1...bc8c590c Still looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21276#pullrequestreview-2672408582 From dlong at openjdk.org Tue Mar 11 00:04:55 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Mar 2025 00:04:55 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> On Mon, 10 Mar 2025 13:37:23 GMT, Matthias Baesken wrote: > When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test > java/foreign/TestHandshake > this error/warning is reported : > > jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' > UndefinedBehaviorSanitizer:DEADLYSIGNAL > UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. > > Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); src/hotspot/share/opto/block.cpp line 1617: > 1615: float f_from_pct = (100 * freq) / b->_freq; > 1616: float f_to_pct = (100 * freq) / target->_freq; > 1617: int from_pct = (f_from_pct < (float)INT_MAX) ? (int)f_from_pct : INT_MAX; I think (float)INT_MAX is problematic. Due to rounding, isn't the result actually greater than INT_MAX? Does it even make sense to have a "pct" that is greater than 100 here? Do we want `int from_pct = MIN2((double)INT_MAX, (double)f_from_pct);` or maybe `int from_pct = MIN2((100.0, (double)f_from_pct);`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1988172603 From duke at openjdk.org Tue Mar 11 01:55:04 2025 From: duke at openjdk.org (duke) Date: Tue, 11 Mar 2025 01:55:04 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v14] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 02:45:55 GMT, kuaiwei wrote: >> This patch enhance MergeStores optimization to support merge value with reverse byte order. >> >> Below is benchmark result before and after the patch: >> >> On aliyun g8y (aarch64) >> |name | before | score2 | ratio | >> |---|---|---|---| >> |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| >> |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| >> |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| >> |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| >> |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| >> |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| >> |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| >> |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| >> |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| >> |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| >> |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| >> |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| >> |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| >> |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| >> |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| >> |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| >> |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| >> |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| >> |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| >> |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| >> |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| >> |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| >> |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| >> |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| >> |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| >> |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| >> |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| >> |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| >> |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| >> >> AMD EPYC 9T24 >> ... > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Update riscv comments @kuaiwei Your change (at version 92e2fcb71ede73b07d3aae4454f9df764d223089) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23030#issuecomment-2712286023 From duke at openjdk.org Tue Mar 11 01:55:04 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 11 Mar 2025 01:55:04 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v14] In-Reply-To: References: Message-ID: <9loZ9HA4z0MOn2w6ez5uVI0uz-TZKhUe1YvJxb09lmQ=.2d8d002d-6f4a-4934-a0db-a43db97b6272@github.com> On Mon, 10 Mar 2025 12:03:22 GMT, Tobias Hartmann wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Update riscv comments > > Looks good to me too. @TobiHartmann @eme64 Thanks for your approve. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23030#issuecomment-2712286699 From duke at openjdk.org Tue Mar 11 02:10:11 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 11 Mar 2025 02:10:11 GMT Subject: Integrated: 8347405: MergeStores with reverse bytes order value In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 09:07:11 GMT, kuaiwei wrote: > This patch enhance MergeStores optimization to support merge value with reverse byte order. > > Below is benchmark result before and after the patch: > > On aliyun g8y (aarch64) > |name | before | score2 | ratio | > |---|---|---|---| > |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| > |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| > |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| > |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| > |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| > |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| > |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| > |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| > |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| > |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| > |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| > |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| > |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| > |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| > |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| > |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| > |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| > |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| > |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| > |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| > |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| > |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| > |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| > |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| > |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| > |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| > |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| > |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| > |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| > > AMD EPYC 9T24 > |name | before | after | ratio | > |---|---|---|---| > |MergeStoreBench.setChar... This pull request has now been integrated. Changeset: 59282092 Author: Kuai Wei Committer: Shaojin Wen URL: https://git.openjdk.org/jdk/commit/5928209280e7a655a22f11bc03eae32a4e99756c Stats: 226 lines in 3 files changed: 142 ins; 14 del; 70 mod 8347405: MergeStores with reverse bytes order value Co-authored-by: Richard Reingruber Reviewed-by: epeter, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/23030 From iklam at openjdk.org Tue Mar 11 03:17:58 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 11 Mar 2025 03:17:58 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK In-Reply-To: <7YqG7ryKpJEk6hBnKnTj3Y_O2p_9x7XfT0M6jNw7nQg=.c5b75d99-9ce0-4898-a174-3646cfc4402a@github.com> References: <1UivDxf7iNhuhkTsh0S60VECAnkYkH4HQrYMmlCrZy0=.179c743b-4495-498c-b4ab-9fc0efce9467@github.com> <7YqG7ryKpJEk6hBnKnTj3Y_O2p_9x7XfT0M6jNw7nQg=.c5b75d99-9ce0-4898-a174-3646cfc4402a@github.com> Message-ID: On Mon, 10 Mar 2025 19:54:09 GMT, Alan Bateman wrote: > @AlanBateman @dholmes-ora @iklam Do you have any other comments/questions about the change? @vnkozlov or others from compiler side, can you please take a look of the change as well? Thanks Please remove `* @bug 8005933` from XShareAuto.java as it no longer applies to that bug ID. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23881#issuecomment-2712457754 From epeter at openjdk.org Tue Mar 11 07:13:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Mar 2025 07:13:04 GMT Subject: RFR: 8351414: C2: MergeStores must happen after RangeCheck smearing [v2] In-Reply-To: References: <1f0hsdcFse85AAgiDu8X7jvGZoesIjLJZ1GTofgJxWo=.5c2e2f1e-66f5-4597-9602-41fa426b3e48@github.com> Message-ID: On Mon, 10 Mar 2025 15:10:26 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> For Christian, add comment > > Marked as reviewed by chagedorn (Reviewer). Thanks @chhagedorn @merykitty for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23944#issuecomment-2712907697 From epeter at openjdk.org Tue Mar 11 07:13:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Mar 2025 07:13:05 GMT Subject: Integrated: 8351414: C2: MergeStores must happen after RangeCheck smearing In-Reply-To: References: Message-ID: <-4hEDnWwq117exytIs3nmmx-u2xukd06qMMovAl11yc=.adcc8ed7-4458-47fc-b5ea-8982321bcd4a@github.com> On Fri, 7 Mar 2025 15:07:37 GMT, Emanuel Peter wrote: > With [JDK-8348959](https://bugs.openjdk.org/browse/JDK-8348959) we see that there can be some issues when RangeCheck smearing happens in the same IGVN phase as MergeStores. It means that some RangeChecks are still around as we do MergeStores, and then we cannot merge as many stores as we would like. We should ensure that RangeCheck smearing happens during post-loop-opts, and then MergeStores happens in a separate dedicated IGVN round afterwards. This pull request has now been integrated. Changeset: 4cf63160 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/4cf63160ad575d49dbe70f128cd36aba22b8f2ff Stats: 110 lines in 8 files changed: 106 ins; 0 del; 4 mod 8351414: C2: MergeStores must happen after RangeCheck smearing Reviewed-by: chagedorn, qamai ------------- PR: https://git.openjdk.org/jdk/pull/23944 From chagedorn at openjdk.org Tue Mar 11 07:34:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 11 Mar 2025 07:34:54 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v3] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 18:02:33 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this trivial patch? >> Currently, it only reports that some flag is non-whitelisted, but does not print out the flag explicitly, but it's helpful to do so. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine Looks much better! Two more suggestions, then I think it's good to go :-) test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 608: > 606: // No IR verification is done if additional non-whitelisted JTreg VM or Javaoptions flag is specified. > 607: List nonWhiteListedFlags = anyNonWhitelistedJTregVMAndJavaOptsFlags(); > 608: nonWhiteListedTest = nonWhiteListedFlags.isEmpty(); You can directly add the type declarations here: boolean debugTest = Platform.isDebugBuild(); boolean intTest = !Platform.isInt(); boolean compTest = !Platform.isComp(); boolean irTest = hasIRAnnotations(); boolean nonWhiteListedTest = nonWhiteListedFlags.isEmpty(); test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 620: > 618: } > 619: if (!intTest) { > 620: System.out.println("- Running with -Xint (use warm-up of 0 instead)"); Suggestion: System.out.println("- Running with -Xint (no compilations)"); ------------- PR Review: https://git.openjdk.org/jdk/pull/23931#pullrequestreview-2673168418 PR Review Comment: https://git.openjdk.org/jdk/pull/23931#discussion_r1988573943 PR Review Comment: https://git.openjdk.org/jdk/pull/23931#discussion_r1988574873 From mbaesken at openjdk.org Tue Mar 11 08:15:53 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 11 Mar 2025 08:15:53 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> Message-ID: On Tue, 11 Mar 2025 00:02:14 GMT, Dean Long wrote: >> When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test >> java/foreign/TestHandshake >> this error/warning is reported : >> >> jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' >> UndefinedBehaviorSanitizer:DEADLYSIGNAL >> UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. >> >> Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); > > src/hotspot/share/opto/block.cpp line 1617: > >> 1615: float f_from_pct = (100 * freq) / b->_freq; >> 1616: float f_to_pct = (100 * freq) / target->_freq; >> 1617: int from_pct = (f_from_pct < (float)INT_MAX) ? (int)f_from_pct : INT_MAX; > > I think (float)INT_MAX is problematic. Due to rounding, isn't the result actually greater than INT_MAX? > Does it even make sense to have a "pct" that is greater than 100 here? > Do we want `int from_pct = MIN2((double)INT_MAX, (double)f_from_pct);` or maybe > `int from_pct = MIN2((100.0, (double)f_from_pct);`? Hi Dean, I added the (float) cast because AIX failed to compile the file without it , for the other platforms/compilers it was okay without the cast. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1988646594 From mbaesken at openjdk.org Tue Mar 11 08:15:54 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 11 Mar 2025 08:15:54 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> Message-ID: On Tue, 11 Mar 2025 08:11:19 GMT, Matthias Baesken wrote: >> src/hotspot/share/opto/block.cpp line 1617: >> >>> 1615: float f_from_pct = (100 * freq) / b->_freq; >>> 1616: float f_to_pct = (100 * freq) / target->_freq; >>> 1617: int from_pct = (f_from_pct < (float)INT_MAX) ? (int)f_from_pct : INT_MAX; >> >> I think (float)INT_MAX is problematic. Due to rounding, isn't the result actually greater than INT_MAX? >> Does it even make sense to have a "pct" that is greater than 100 here? >> Do we want `int from_pct = MIN2((double)INT_MAX, (double)f_from_pct);` or maybe >> `int from_pct = MIN2((100.0, (double)f_from_pct);`? > > Hi Dean, I added the (float) cast because AIX failed to compile the file without it , for the other platforms/compilers it was okay without the cast. > Does it even make sense to have a "pct" that is greater than 100 here I am not so familiar with the ranges that make sense for the from_pct. Maybe some C2 compiler expert could comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1988651183 From dfenacci at openjdk.org Tue Mar 11 08:46:54 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 11 Mar 2025 08:46:54 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> Message-ID: On Tue, 11 Mar 2025 00:02:14 GMT, Dean Long wrote: >> When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test >> java/foreign/TestHandshake >> this error/warning is reported : >> >> jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' >> UndefinedBehaviorSanitizer:DEADLYSIGNAL >> UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. >> >> Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); > > src/hotspot/share/opto/block.cpp line 1617: > >> 1615: float f_from_pct = (100 * freq) / b->_freq; >> 1616: float f_to_pct = (100 * freq) / target->_freq; >> 1617: int from_pct = (f_from_pct < (float)INT_MAX) ? (int)f_from_pct : INT_MAX; > > I think (float)INT_MAX is problematic. Due to rounding, isn't the result actually greater than INT_MAX? > Does it even make sense to have a "pct" that is greater than 100 here? > Do we want `int from_pct = MIN2((double)INT_MAX, (double)f_from_pct);` or maybe > `int from_pct = MIN2((100.0, (double)f_from_pct);`? Adding to @dean-long's questions, I was wondering how we can get to a 9.97582e+36 value (since it is a runtime ubsan issue): is this a result of successive rounding-ups or there is perhaps an upstream issue? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1988706183 From mbaesken at openjdk.org Tue Mar 11 09:06:07 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 11 Mar 2025 09:06:07 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> Message-ID: On Tue, 11 Mar 2025 00:02:14 GMT, Dean Long wrote: >> When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test >> java/foreign/TestHandshake >> this error/warning is reported : >> >> jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' >> UndefinedBehaviorSanitizer:DEADLYSIGNAL >> UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. >> >> Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); > > src/hotspot/share/opto/block.cpp line 1617: > >> 1615: float f_from_pct = (100 * freq) / b->_freq; >> 1616: float f_to_pct = (100 * freq) / target->_freq; >> 1617: int from_pct = (f_from_pct < (float)INT_MAX) ? (int)f_from_pct : INT_MAX; > > I think (float)INT_MAX is problematic. Due to rounding, isn't the result actually greater than INT_MAX? > Does it even make sense to have a "pct" that is greater than 100 here? > Do we want `int from_pct = MIN2((double)INT_MAX, (double)f_from_pct);` or maybe > `int from_pct = MIN2((100.0, (double)f_from_pct);`? > Adding to @dean-long's questions, I was wondering how we can get to a 9.97582e+36 value (since it is a runtime ubsan issue): is this a result of successive rounding-ups or there is perhaps an upstream issue? That's a good question. I could add a bit of tracing or asserts to find out more about this. It seems this is an aarch64 related thing because on (Linux) x86_64/ppc64le I never observed this. Do you think those high values are not expected ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1988741722 From rcastanedalo at openjdk.org Tue Mar 11 09:21:03 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 11 Mar 2025 09:21:03 GMT Subject: RFR: 8350485: C2: factor out common code in Node::grow() and Node::out_grow() [v2] In-Reply-To: References: Message-ID: <-S3TThpR2YYay57wwU5XUlFXm8ZambF36RKQpCIPNjw=.1e7442ca-1d11-4bbf-ab9d-fe6a198e80f8@github.com> On Mon, 10 Mar 2025 14:06:36 GMT, Saranya Natarajan wrote: >> Node:grow() and Node::out_grow() are copy-pasted from each other and their core logic could be factored out into a third function or at least cleaned up. Hence,the fix includes a function Node::array_resize() that implements the core logic of Node::grow() and Node::out_grow(). >> >> Link to Github action which had no failures : https://github.com/sarannat/jdk/actions/runs/13677508359 > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > 8350485: Addressing review comments on code style Thanks for cleaning up this code, Saranya! I have a few more style/naming suggestions, the refactored logic looks otherwise good to me. src/hotspot/share/opto/node.cpp line 667: > 665: } > 666: > 667: // Resize input or output array to grow it the next larger power-of-2 bigger Suggestion: // Resize input or output array to grow it to the next larger power-of-2 bigger src/hotspot/share/opto/node.cpp line 669: > 667: // Resize input or output array to grow it the next larger power-of-2 bigger > 668: // than len. > 669: void Node::resize_array(Node**& array, node_idx_t& max_size, uint len, bool is_input_array) { This is subjective, but I would find it clearer if the name of `bool is_input_array` reflected what does `resize_array` needs to do with `array` rather than what is the source/origin of `array`. My suggestion would be something like `bool needs_clearing`, `bool initialize_to_null`, or similar. src/hotspot/share/opto/node.cpp line 686: > 684: // Trimming to limit allows a uint8 to handle up to 255 edges. > 685: // Previously I was using only powers-of-2 which peaked at 128 edges. > 686: //if( new_max >= limit ) new_max = limit-1; I suggest to remove this line that was already commented out before this changeset. src/hotspot/share/opto/node.cpp line 689: > 687: if (!is_input_array) { > 688: assert(array != nullptr && array != NO_OUT_ARRAY, "out must have sensible value"); > 689: } This is somewhat subjective, but I prefer to inline the `!is_input_array` pre-condition into the assertion itself, for compactness. Suggestion: assert(is_input_array || (array != nullptr && array != NO_OUT_ARRAY), "out must have sensible value"); src/hotspot/share/opto/node.cpp line 697: > 695: // This assertion makes sure that Node::_max is wide enough to > 696: // represent the numerical value of new_max. > 697: assert(max_size == new_max && max_size > len, "int width of _max is too small"); This is pre-existing, but I think it is worth simplifying anyway: Suggestion: assert(max_size > len, "int width of _max is too small"); src/hotspot/share/opto/node.cpp line 708: > 706: //-----------------------------out_grow---------------------------------------- > 707: // Grow the input array, making space for more edges > 708: void Node::out_grow( uint len ) { For style consistency with `Node::grow`: Suggestion: void Node::out_grow(uint len) { src/hotspot/share/opto/node.hpp line 339: > 337: void out_grow( uint len ); > 338: // Resize input or output array to grow it the next larger power-of-2 bigger > 339: // than len. Suggestion: // Resize input or output array to grow it to the next larger power-of-2 // bigger than len. ------------- PR Review: https://git.openjdk.org/jdk/pull/23928#pullrequestreview-2673440209 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1988721217 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1988764754 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1988740998 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1988745651 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1988738802 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1988754817 PR Review Comment: https://git.openjdk.org/jdk/pull/23928#discussion_r1988715587 From duke at openjdk.org Tue Mar 11 09:24:13 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 09:24:13 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v10] In-Reply-To: References: Message-ID: <8gBDkSUJY2TbzOQFEGgIYR1Mn0F3bKZLfADwtSoCxEc=.f9b5b0d3-565b-43f3-ac38-0904f5406c94@github.com> > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Copied bits_per_java_integer, hoping it will merge nicely ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/b07d0a2c..042c4dd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=08-09 Stats: 9 lines in 2 files changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From duke at openjdk.org Tue Mar 11 09:24:13 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 09:24:13 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v5] In-Reply-To: References: <0-rhfBkyIO8uh5uioQ4XDoEWGwRdfzah8GJrYvILeDM=.699a73a4-ffd2-42f1-b7df-4c32235b3218@github.com> Message-ID: On Mon, 10 Mar 2025 15:56:36 GMT, Emanuel Peter wrote: >> Happily, as soon as this other PR is merged! > > Or you just copy it and hope that the merge will eventually be clean, up to you. I choose to hope! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988775536 From duke at openjdk.org Tue Mar 11 09:38:10 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 09:38:10 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v11] In-Reply-To: References: Message-ID: > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: - char -> byte - nit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/042c4dd3..abe137a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=09-10 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From duke at openjdk.org Tue Mar 11 09:38:10 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 09:38:10 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v9] In-Reply-To: References: <-ea_PqhfAG9kdSWC_MsAKeSjgfUazBWr9QB1f_gJUdo=.170e7391-65c3-4aa6-b2f4-779182997156@github.com> Message-ID: On Mon, 10 Mar 2025 15:26:07 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Random testing, trying... > > src/hotspot/share/opto/memnode.cpp line 3536: > >> 3534: // It is thus useful to handle the case where conIL > conIR. >> 3535: // >> 3536: // Let's assume we have the following 32 bits integer that we want to stuff in 8 bits char: > > `char` actually is `16` bits unsigned in Java ;) Yes, right. I replaced with byte. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988803887 From mli at openjdk.org Tue Mar 11 09:39:28 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Mar 2025 09:39:28 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v4] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this trivial patch? > Currently, it only reports that some flag is non-whitelisted, but does not print out the flag explicitly, but it's helpful to do so. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23931/files - new: https://git.openjdk.org/jdk/pull/23931/files/62cc8dbc..2c864da3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23931&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23931&range=02-03 Stats: 8 lines in 1 file changed: 0 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23931.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23931/head:pull/23931 PR: https://git.openjdk.org/jdk/pull/23931 From mli at openjdk.org Tue Mar 11 09:39:28 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Mar 2025 09:39:28 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v3] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 07:30:35 GMT, Christian Hagedorn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> refine > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 608: > >> 606: // No IR verification is done if additional non-whitelisted JTreg VM or Javaoptions flag is specified. >> 607: List nonWhiteListedFlags = anyNonWhitelistedJTregVMAndJavaOptsFlags(); >> 608: nonWhiteListedTest = nonWhiteListedFlags.isEmpty(); > > You can directly add the type declarations here: > > boolean debugTest = Platform.isDebugBuild(); > boolean intTest = !Platform.isInt(); > boolean compTest = !Platform.isComp(); > boolean irTest = hasIRAnnotations(); > > boolean nonWhiteListedTest = nonWhiteListedFlags.isEmpty(); Yes, fixed. > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 620: > >> 618: } >> 619: if (!intTest) { >> 620: System.out.println("- Running with -Xint (use warm-up of 0 instead)"); > > Suggestion: > > System.out.println("- Running with -Xint (no compilations)"); Thanks for the suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23931#discussion_r1988805712 PR Review Comment: https://git.openjdk.org/jdk/pull/23931#discussion_r1988805876 From duke at openjdk.org Tue Mar 11 10:03:43 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 10:03:43 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v12] In-Reply-To: References: Message-ID: > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: rephrase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/abe137a9..7421ea50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=10-11 Stats: 32 lines in 1 file changed: 6 ins; 3 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From duke at openjdk.org Tue Mar 11 10:03:44 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 10:03:44 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v9] In-Reply-To: References: <-ea_PqhfAG9kdSWC_MsAKeSjgfUazBWr9QB1f_gJUdo=.170e7391-65c3-4aa6-b2f4-779182997156@github.com> Message-ID: On Mon, 10 Mar 2025 15:38:56 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/memnode.cpp line 3541: >> >>> 3539: // +------------------------+---------+ >>> 3540: // 31 8 7 0 >>> 3541: // v[0..7] is meaningful, but v[8..31] is not. In this case, num_rejected_bits == 24. >> >> Would this example not be nice with the original case above? >> >> // Check for useless sign-extension before a partial-word store >> // (StoreB ... (RShiftI _ (LShiftI _ valIn conIL ) conIR) ) >> // If (conIL == conIR && conIR <= num_bits) this simplifies to >> // (StoreB ... (valIn) ) >> >> Because it seems you are assuming here that `conIL == conIR`, right? And then below you ask what if they are not equal. > > It could also be nice to introduce the `num_rejected_bits` somewhere. I don't understand what you mean. I'm not speaking about `conIR` and `conIL` here. I'm just saying we have a value v that we want to store in a 8-bit long storage (a byte). We can see v as made of its 8 lower bits (that we want to actually store) and the 24 upper bits (that we don't care about). I'm just introducing notations, but I haven't done any operation yet. I tried to rephrase around there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988862498 From duke at openjdk.org Tue Mar 11 10:09:37 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 10:09:37 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v13] In-Reply-To: References: Message-ID: > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with three additional commits since the last revision: - correct - s - rephrased corner case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/7421ea50..f244e177 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=11-12 Stats: 10 lines in 1 file changed: 4 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From duke at openjdk.org Tue Mar 11 10:09:38 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 10:09:38 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v9] In-Reply-To: References: <-ea_PqhfAG9kdSWC_MsAKeSjgfUazBWr9QB1f_gJUdo=.170e7391-65c3-4aa6-b2f4-779182997156@github.com> Message-ID: On Mon, 10 Mar 2025 15:33:48 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Random testing, trying... > > src/hotspot/share/opto/memnode.cpp line 3549: > >> 3547: // Let's also remember that conIL < 32 since (x << 33) is simplified into (x << 1) >> 3548: // and (x << 31) << 2 is simplified into 0. This means that in any case, after the >> 3549: // left shift, we always have at least one bit of the original v. > > What does `original v` refer to? Is this the same as the `X` and the `valIn`? rephrased > src/hotspot/share/opto/memnode.cpp line 3577: > >> 3575: // +------------------+---------+-----+ >> 3576: // 31 8 7 2 1 0 >> 3577: // The non-rejected bits are the 8 lower one of (v << conIL - conIR). > > Suggestion: > > // The non-rejected bits are the 8 lower ones of (v << conIL - conIR). nice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988873142 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988872799 From duke at openjdk.org Tue Mar 11 10:09:38 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 10:09:38 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v9] In-Reply-To: References: <-ea_PqhfAG9kdSWC_MsAKeSjgfUazBWr9QB1f_gJUdo=.170e7391-65c3-4aa6-b2f4-779182997156@github.com> Message-ID: On Mon, 10 Mar 2025 15:40:34 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/memnode.cpp line 3579: >> >>> 3577: // The non-rejected bits are the 8 lower one of (v << conIL - conIR). >>> 3578: // The bits 6 and 7 of v have been thrown away after the shift left. >>> 3579: // The simplification is still fine. >> >> Suggestion for everywhere: `fine` -> `valid`. > > Or correct. I like correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988874742 From duke at openjdk.org Tue Mar 11 10:41:11 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 10:41:11 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v14] In-Reply-To: References: Message-ID: > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: - order - rephrase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/f244e177..5afbe37f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=12-13 Stats: 65 lines in 1 file changed: 55 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From duke at openjdk.org Tue Mar 11 10:41:11 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 10:41:11 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v9] In-Reply-To: References: <-ea_PqhfAG9kdSWC_MsAKeSjgfUazBWr9QB1f_gJUdo=.170e7391-65c3-4aa6-b2f4-779182997156@github.com> Message-ID: On Mon, 10 Mar 2025 15:50:47 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Random testing, trying... > > src/hotspot/share/opto/memnode.cpp line 3546: > >> 3544: // is valid if: >> 3545: // - conIL >= conIR >> 3546: // - conIR <= num_rejected_bits > > Is there also a restriction on `conIL`? No, see the examples. I've added some paragraphs that may help understanding why not: the shift left will only discard higher bits, and introduce 0-lower bits. It does nothing dangerous by itself. The right shift is risky because it might pull the sign bit of (v << conIL) into the bits considered by the store, and actually changing the value. > src/hotspot/share/opto/memnode.cpp line 3581: > >> 3579: // The simplification is still fine. >> 3580: // >> 3581: // ### Case 3: conIL > conIR < num_rejected_bits. > > Suggestion: > > // ### Case 3: conIL > conIR > > Or do you need that? And if so, do we only have `conIR < num_rejected_bits`, or also `conIL < num_rejected_bits`. > A combination of `>` and `<` in the same equation can be a little confusing ;) What I wrote is exactly what I meant. I do need `conIR < num_rejected_bits` (see former case 4, new case 2.3 for what happens otherwise). I don't need `conIL < num_rejected_bits` (see this example (former case 3, new case 2.2) where conIL = 26 > num_rejected_bits = 24). > src/hotspot/share/opto/memnode.cpp line 3593: > >> 3591: // +------------------+---------+-----+ >> 3592: // 31 10 9 4 3 0 >> 3593: // The non-rejected bits are the 8 lower one of (v << conIL - conIR). > > Suggestion: > > // The non-rejected bits are the 8 lower ones of (v << conIL - conIR). > > But it seems we only actually kept 6 of them? nice > src/hotspot/share/opto/memnode.cpp line 3595: > >> 3593: // The non-rejected bits are the 8 lower one of (v << conIL - conIR). >> 3594: // The bits 6 and 7 of v have been thrown away after the shift left. >> 3595: // The bits 4 and 5 of v are still present, but outside of the kept bits (the 8 lower ones). > > I have trouble understanding this line. Bits 0-5 are still present. What do you mean by "outside of the kept bits"? Rephrased. Might be clearer now. > src/hotspot/share/opto/memnode.cpp line 3621: > >> 3619: // Valid if conIL >= conIR <= num_rejected_bits >> 3620: // >> 3621: // We do not treat the case conIR > conIL here since the point of this function is > > Suggestion: > > // We do not treat the case conIL < conIR here since the point of this function is > > Nit: just to keep the two values on the sides we have used up to now. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988930285 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988930385 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988931045 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988931886 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988936395 From jbhateja at openjdk.org Tue Mar 11 10:52:04 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 11 Mar 2025 10:52:04 GMT Subject: RFR: 8350840: C2: x64 Assembler::vpcmpeqq assert: failed: XMM register should be 0-15 Message-ID: This bug fix patch addressed an assertion failure due to unexpected register operand encoding. AVX2 flavour of instruction "vpcmpeqq" expects to operate over XMM registers from lower register bank (0-15), in this case, the register mask associated with the destination vector operand of the matcher pattern also includes registers from the higher bank. The issue can be reliably reproduced if we modify the static allocation order of XMM register through AD file change. Existing bug [JDK-8343294](https://bugs.openjdk.org/browse/JDK-8343294) already tracks the requirement to randomize the allocation ordering. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8350840: C2: x64 Assembler::vpcmpeqq assert: failed: XMM register should be 0-15 Changes: https://git.openjdk.org/jdk/pull/23979/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23979&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350840 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23979/head:pull/23979 PR: https://git.openjdk.org/jdk/pull/23979 From duke at openjdk.org Tue Mar 11 10:56:18 2025 From: duke at openjdk.org (David Linus Briemann) Date: Tue, 11 Mar 2025 10:56:18 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C [v4] In-Reply-To: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: > Local benchmarks show good improvements for the crc32c intrinsification: > > > without intrinsic (master): > > > $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 > offset = 0 > msgSize = 5120 bytes > iters = 300000 > ------------------------------------------------------- > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > CRC32C.update(byte[]) runtime = 1.186507782 seconds > CRC32C.update(byte[]) throughput = 1294.5553525244388 MB/s > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > ------------------------------------------------------- > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > CRC32C.update(ByteBuffer) runtime = 1.355515648 seconds > CRC32C.update(ByteBuffer) throughput = 1133.1481139788364 MB/s > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > ------------------------------------------------------- > > > with intrinsic: > > > $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 > offset = 0 > msgSize = 5120 bytes > iters = 300000 > ------------------------------------------------------- > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > CRC32C.update(byte[]) runtime = 0.065003188 seconds > CRC32C.update(byte[]) throughput = 23629.610289267657 MB/s > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > ------------------------------------------------------- > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > CRC32C.update(ByteBuffer) runtime = 0.072310133 seconds > CRC32C.update(ByteBuffer) throughput = 21241.836189127185 MB/s > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > ------------------------------------------------------- David Linus Briemann has updated the pull request incrementally with two additional commits since the last revision: - fix - address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23826/files - new: https://git.openjdk.org/jdk/pull/23826/files/0b93d006..c3eb92d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23826&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23826&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23826.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23826/head:pull/23826 PR: https://git.openjdk.org/jdk/pull/23826 From duke at openjdk.org Tue Mar 11 10:58:14 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 10:58:14 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v15] In-Reply-To: References: Message-ID: > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: more checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/5afbe37f..e3ecf350 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=13-14 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From duke at openjdk.org Tue Mar 11 10:58:15 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 11 Mar 2025 10:58:15 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v9] In-Reply-To: References: <-ea_PqhfAG9kdSWC_MsAKeSjgfUazBWr9QB1f_gJUdo=.170e7391-65c3-4aa6-b2f4-779182997156@github.com> Message-ID: On Mon, 10 Mar 2025 15:55:37 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Random testing, trying... > > src/hotspot/share/opto/memnode.cpp line 3632: > >> 3630: if (shr->Opcode() == Op_RShiftI) { >> 3631: const TypeInt* conIR = phase->type(shr->in(2))->isa_int(); >> 3632: if (conIR != nullptr && conIR->is_con() && conIR->get_con() <= num_rejected_bits) { > > How do we know that `conIR` (and also `conIL`) are not negative? That is a good one. If `maskShiftAmount` was there before (which happens in the idealization of shifts, which seems to happen even in GVN, so as far as I understand, during parsing) we put the shift in bounds with: https://github.com/openjdk/jdk/blob/cd9f1d3d921531511a7552807d099d5d3cce01a6/src/hotspot/share/opto/mulnode.cpp#L955 and then, if it changes the value (so if the shift magnitude wasn't in [0, nbits - 1]): https://github.com/openjdk/jdk/blob/cd9f1d3d921531511a7552807d099d5d3cce01a6/src/hotspot/share/opto/mulnode.cpp#L961-L962 So I suspect the shift will have a reasonable magnitude early enough. But indeed, I'm not sure enough, and I think an additional check doesn't hurt. On the other hand, the previous version of this wasn't performing any such check, so if it's a bug, it existed already. Bonus: according to https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.19, it seems that the logic of `maskShiftAmount` is correct in case of negative numbers (just bitwise masking). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r1988970004 From duke at openjdk.org Tue Mar 11 11:10:10 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Tue, 11 Mar 2025 11:10:10 GMT Subject: RFR: 8330469: C2: Remove or change "PrintOpto && VerifyLoopOptimizations" as printing code condition [v2] In-Reply-To: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> References: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> Message-ID: <2QTYQCodAeUq8jIvqiTmBELmgkdDck_fYLX6if2NHcU=.18d3549a-76a2-482a-bebe-3e379f1db82b@github.com> > **Issue:** There are currently 9 occurrences where we guard printing code with PrintOpto && VerifyLoopOptimizations. This flag combo is never really used in practice. > > **Solution**: I analysed the 9 occurrence. In cases, where the flag `PrintOpto && VerifyLoopOptimizations` was followed by flag `TraceLoopOpts` with `else if` or `|| operator` I removed the former flag. In other cases, where `PrintOpto && VerifyLoopOptimizations` was the only flag, I was replaced with `TraceLoopOpts`. > > **Test Result**: Link to [GitHub Action](https://github.com/sarannat/jdk/actions/runs/13723071055) run on commit [91ecc51](https://github.com/sarannat/jdk/commit/91ecc5190ce31da94bded4de210136f337286e69) Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: JDK-8330469: Addressing review comments by removing TraceLoopOpts and some dump() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23959/files - new: https://git.openjdk.org/jdk/pull/23959/files/91ecc519..b7216ca6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23959&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23959&range=00-01 Stats: 30 lines in 3 files changed: 0 ins; 30 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23959.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23959/head:pull/23959 PR: https://git.openjdk.org/jdk/pull/23959 From chagedorn at openjdk.org Tue Mar 11 12:28:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 11 Mar 2025 12:28:53 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v4] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 09:39:28 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this trivial patch? >> Currently, it only reports that some flag is non-whitelisted, but does not print out the flag explicitly, but it's helpful to do so. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor That looks good to me, thanks for all the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23931#pullrequestreview-2674266002 From bulasevich at openjdk.org Tue Mar 11 12:37:10 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 11 Mar 2025 12:37:10 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v14] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Thu, 6 Mar 2025 12:15:52 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache. >> >> OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data). >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - swap matadata and jvmci data in outputs according to data layout > - cleanup > - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup > - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description > - add a separate adrp_movk function to to support targets located more than 4GB away > - Force the use of movk in combination with adrp and ldr instructions to address scenarios > where os::malloc allocates buffers beyond the typical ?4GB range accessible with adrp > - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option: > _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. > Fix: use _oops_size int16 field to calculate metadata offset > - removing dead code > - a bit of cleanup and addressing review suggestions > - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup > - ... and 5 more: https://git.openjdk.org/jdk/compare/cfab88b1...bc8c590c Let me integrate. Many thanks to the reviewers! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2714007490 From bulasevich at openjdk.org Tue Mar 11 12:37:10 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 11 Mar 2025 12:37:10 GMT Subject: Integrated: 8343789: Move mutable nmethod data out of CodeCache In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: <_D07yT4Q2ecC9GfMFhXwAw4ClYEiy5xoF08Nb1fJu5E=.eac392d4-79b4-4547-9e2d-1d27715a9b14@github.com> On Tue, 1 Oct 2024 02:10:37 GMT, Boris Ulasevich wrote: > This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache. > > OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data). > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. This pull request has now been integrated. Changeset: 83de3404 Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/83de34041eacdf987988364487712c79bbb4c235 Stats: 192 lines in 7 files changed: 87 ins; 37 del; 68 mod 8343789: Move mutable nmethod data out of CodeCache Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/21276 From mli at openjdk.org Tue Mar 11 12:42:07 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Mar 2025 12:42:07 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v4] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 12:26:31 GMT, Christian Hagedorn wrote: > That looks good to me, thanks for all the updates! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23931#issuecomment-2714033224 From duke at openjdk.org Tue Mar 11 13:35:55 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 11 Mar 2025 13:35:55 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v41] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed (2 reviews required, with at least 2 [Reviewers](https://openjdk.org/bylaws#reviewer)) > > > > ### Reviewers > * [Quan Anh Mai](https://openjdk.org/census#qamai) (@merykitty - Committer) ? Re-review required (review applies to [cf779497](https://git.openjdk.org/jdk/pull/23089/files/cf77949776f7a4601268c7291a5743c2eb164186)) > > ### Reviewing >
Using git > > Checkout this PR locally: \ > `$ git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089` \ > `$ git checkout pull/23089` > > Update a local copy of the PR: \ > `$ git checkout pull/23089` \ > `$ git pull https://git.openjdk.org/jdk.git pull/23089/head` > >
>
Using Skara CLI tools > > Checkout this PR locally: \ > `$ git pr checkout 23089` > > View PR using the GUI difftool: \ > `$ git pr show -t 23089` > >
>
Using diff file > > Download this PR as a diff file: \ > https://git.openjdk.org/jdk/pull/23089.diff > >
>
Using Webrev > > [Link to Webrev Comment](https://git.openjdk.org/jdk/pull/23089#issuecomment-2593992282) >
Johannes Graham has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 56 commits: - Merge branch 'openjdk:master' into xor_const - invert comparison in tests - update bug numbers and summary - add test of random ranges - consistency - Merge branch 'openjdk:master' into xor_const - widen range of test values; add missing comment - a few more tests - add comments Co-authored-by: Emanuel Peter - update tests - ... and 46 more: https://git.openjdk.org/jdk/compare/af9af7e9...9532f957 ------------- Changes: https://git.openjdk.org/jdk/pull/23089/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=40 Stats: 527 lines in 5 files changed: 476 ins; 25 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From duke at openjdk.org Tue Mar 11 13:38:15 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 11 Mar 2025 13:38:15 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v27] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 08:43:39 GMT, Emanuel Peter wrote: >> Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: >> >> formatting, remove commented tests > > I also see that https://github.com/openjdk/jdk/pull/2776 and https://github.com/openjdk/jdk/pull/4136 were mentioned here. Both of those are related an have no IR tests of their own, yikes! We have to ensure that we cover those old cases, and then new ones here, so that we do not get any accidental regressions. > > Maybe that's all already covered in other existing tests or the tests you added. Can you please provide a summary of all tests and what cases they cover in the PR description? It would help a lot for reviewing. Hi @eme64, do you have any more recommendations on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2714282457 From mdoerr at openjdk.org Tue Mar 11 13:53:04 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 11 Mar 2025 13:53:04 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C [v4] In-Reply-To: References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: On Tue, 11 Mar 2025 10:56:18 GMT, David Linus Briemann wrote: >> Local benchmarks show good improvements for the crc32c intrinsification: >> >> >> without intrinsic (master): >> >> >> $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 >> offset = 0 >> msgSize = 5120 bytes >> iters = 300000 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(byte[]) runtime = 1.186507782 seconds >> CRC32C.update(byte[]) throughput = 1294.5553525244388 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(ByteBuffer) runtime = 1.355515648 seconds >> CRC32C.update(ByteBuffer) throughput = 1133.1481139788364 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> >> >> with intrinsic: >> >> >> $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 >> offset = 0 >> msgSize = 5120 bytes >> iters = 300000 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(byte[]) runtime = 0.065003188 seconds >> CRC32C.update(byte[]) throughput = 23629.610289267657 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(ByteBuffer) runtime = 0.072310133 seconds >> CRC32C.update(ByteBuffer) throughput = 21241.836189127185 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- > > David Linus Briemann has updated the pull request incrementally with two additional commits since the last revision: > > - fix > - address review comments LGTM. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23826#pullrequestreview-2674615775 From mli at openjdk.org Tue Mar 11 14:13:27 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Mar 2025 14:13:27 GMT Subject: RFR: 8351662: [Test] RISC-V: enable bunch of IR test Message-ID: Hi, Can you help to review this patch? There are bunch of IR test not enabled on riscv, it's good to enable them, because enabling them will help to: 1. cover the test gap 2. find out potential missing intrinsics. This patch also changes cpu features from `vm.opt.UseXxx` to `xxx`, as it's easier to find out these tests in the future. NOTE: There are still some other test should be enabled, but currently they fail when simply enable them, they will be further investigated, later could be enabled in other PRs with additional code change/implementation if it's feasible. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/23985/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23985&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351662 Stats: 247 lines in 32 files changed: 22 ins; 6 del; 219 mod Patch: https://git.openjdk.org/jdk/pull/23985.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23985/head:pull/23985 PR: https://git.openjdk.org/jdk/pull/23985 From rcastanedalo at openjdk.org Tue Mar 11 14:15:06 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 11 Mar 2025 14:15:06 GMT Subject: RFR: 8330469: C2: Remove or change "PrintOpto && VerifyLoopOptimizations" as printing code condition [v2] In-Reply-To: <2QTYQCodAeUq8jIvqiTmBELmgkdDck_fYLX6if2NHcU=.18d3549a-76a2-482a-bebe-3e379f1db82b@github.com> References: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> <2QTYQCodAeUq8jIvqiTmBELmgkdDck_fYLX6if2NHcU=.18d3549a-76a2-482a-bebe-3e379f1db82b@github.com> Message-ID: On Tue, 11 Mar 2025 11:10:10 GMT, Saranya Natarajan wrote: >> **Issue:** There are currently 9 occurrences where we guard printing code with PrintOpto && VerifyLoopOptimizations. This flag combo is never really used in practice. >> >> **Solution**: I analysed the 9 occurrence. In cases, where the flag `PrintOpto && VerifyLoopOptimizations` was followed by flag `TraceLoopOpts` with `else if` or `|| operator` I removed the former flag. In other cases, where `PrintOpto && VerifyLoopOptimizations` was the only flag, I was replaced with `TraceLoopOpts`. >> >> **Test Result**: Link to [GitHub Action](https://github.com/sarannat/jdk/actions/runs/13723071055) run on commit [91ecc51](https://github.com/sarannat/jdk/commit/91ecc5190ce31da94bded4de210136f337286e69) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8330469: Addressing review comments by removing TraceLoopOpts and some dump() Looks good to me, thanks for simplifying this code! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23959#pullrequestreview-2674715849 From jiangli at openjdk.org Tue Mar 11 15:28:25 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 11 Mar 2025 15:28:25 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK [v3] In-Reply-To: References: Message-ID: > Please review the `Arguments::parse_each_vm_init_arg` change to ignore`-server|-client` options, which avoids unrecognized option error on static JDK. > > On regular JDK, '-server|-client' options are processed/removed from command-line arguments by `CheckJvmType` during `CreateExecutionEnvironment`. That happens before `Arguments::parse_each_vm_init_arg` is called. With jvm.cfg setting, only server vm is known and client is ignored. So specifying '-server' and '-client' in command-line is really a no-op. > > On static JDK, the VM is statically linked with the launcher, and `CreateExecutionEnvironment` & `CheckJvmType` are not called. As the result, `Arguments::parse_each_vm_init_arg` could see `-server|-client` when running on static JDK, if the options are specified in the command line. Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: Remove @bug and update @summary. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23881/files - new: https://git.openjdk.org/jdk/pull/23881/files/3189513d..3d7331f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23881&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23881&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23881.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23881/head:pull/23881 PR: https://git.openjdk.org/jdk/pull/23881 From jiangli at openjdk.org Tue Mar 11 15:28:25 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 11 Mar 2025 15:28:25 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK In-Reply-To: <7YqG7ryKpJEk6hBnKnTj3Y_O2p_9x7XfT0M6jNw7nQg=.c5b75d99-9ce0-4898-a174-3646cfc4402a@github.com> References: <1UivDxf7iNhuhkTsh0S60VECAnkYkH4HQrYMmlCrZy0=.179c743b-4495-498c-b4ab-9fc0efce9467@github.com> <7YqG7ryKpJEk6hBnKnTj3Y_O2p_9x7XfT0M6jNw7nQg=.c5b75d99-9ce0-4898-a174-3646cfc4402a@github.com> Message-ID: <6XSLrtEyc3CqYlhvMUABGsDuIffNP029A7vovct2pD8=.8a4dc0c1-dea8-4da7-8722-149a1b8af3b0@github.com> On Mon, 10 Mar 2025 19:54:09 GMT, Alan Bateman wrote: >> Jiangli and I chatted about this today. We don't think there will be developers looking to specify -server or -client to a static image, instead this is more about the tests. So we think the best think is to look at the tests that still specify -server and see if it can be dropped. Some of the tests (say for C2) might be better off using `@requires vm.compiler2.enabled` or `@requires vm.flavor == "server"`. > >> @AlanBateman @dholmes-ora @iklam Do you have any other comments/questions about the change? @vnkozlov or others from compiler side, can you please take a look of the change as well? Thanks > > I wasn't initially sure about XShareAuto.java but I see the exchange between you and Ioi so I think all good. > > @AlanBateman @dholmes-ora @iklam Do you have any other comments/questions about the change? @vnkozlov or others from compiler side, can you please take a look of the change as well? Thanks > > Please remove `* @bug 8005933` from XShareAuto.java as it no longer applies to that bug ID. Done. Also updated @summary. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23881#issuecomment-2714753977 From roland at openjdk.org Tue Mar 11 15:43:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 11 Mar 2025 15:43:58 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:13:55 GMT, Emanuel Peter wrote: > @rwestrel @galderz Are you two still working on this or is it ready for someone else to review? @eme64 I believe, it is ready for someone else to review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2714815018 From sviswanathan at openjdk.org Tue Mar 11 15:47:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 11 Mar 2025 15:47:58 GMT Subject: RFR: 8350840: C2: x64 Assembler::vpcmpeqq assert: failed: XMM register should be 0-15 In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 10:46:39 GMT, Jatin Bhateja wrote: > This bug fix patch addressed an assertion failure due to unexpected register operand encoding. > AVX2 flavour of instruction "vpcmpeqq" expects to operate over XMM registers from lower register bank (0-15), in this case, the register mask associated with the destination vector operand of the matcher pattern also includes registers from the higher bank. > > The issue can be reliably reproduced if we modify the static allocation order of XMM register through AD file change. > Existing bug [JDK-8343294](https://bugs.openjdk.org/browse/JDK-8343294) already tracks the requirement to randomize the allocation ordering. > > Kindly review and share your feedback. > > Best Regards, > Jatin Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23979#pullrequestreview-2675138679 From chagedorn at openjdk.org Tue Mar 11 15:54:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 11 Mar 2025 15:54:54 GMT Subject: RFR: 8330469: C2: Remove or change "PrintOpto && VerifyLoopOptimizations" as printing code condition [v2] In-Reply-To: <2QTYQCodAeUq8jIvqiTmBELmgkdDck_fYLX6if2NHcU=.18d3549a-76a2-482a-bebe-3e379f1db82b@github.com> References: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> <2QTYQCodAeUq8jIvqiTmBELmgkdDck_fYLX6if2NHcU=.18d3549a-76a2-482a-bebe-3e379f1db82b@github.com> Message-ID: On Tue, 11 Mar 2025 11:10:10 GMT, Saranya Natarajan wrote: >> **Issue:** There are currently 9 occurrences where we guard printing code with PrintOpto && VerifyLoopOptimizations. This flag combo is never really used in practice. >> >> **Solution**: I analysed the 9 occurrence. In cases, where the flag `PrintOpto && VerifyLoopOptimizations` was followed by flag `TraceLoopOpts` with `else if` or `|| operator` I removed the former flag. In other cases, where `PrintOpto && VerifyLoopOptimizations` was the only flag, I was replaced with `TraceLoopOpts`. >> >> **Test Result**: Link to [GitHub Action](https://github.com/sarannat/jdk/actions/runs/13723071055) run on commit [91ecc51](https://github.com/sarannat/jdk/commit/91ecc5190ce31da94bded4de210136f337286e69) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8330469: Addressing review comments by removing TraceLoopOpts and some dump() Thanks for the update, looks good! Can you file an RFE for `TraceSplitIf` and link it to this RFE? Then we can keep track of this :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23959#pullrequestreview-2675176488 From vlivanov at openjdk.org Tue Mar 11 16:50:14 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 11 Mar 2025 16:50:14 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 13:27:00 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23468#pullrequestreview-2675383797 From kvn at openjdk.org Tue Mar 11 16:59:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 11 Mar 2025 16:59:52 GMT Subject: RFR: 8350840: C2: x64 Assembler::vpcmpeqq assert: failed: XMM register should be 0-15 In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 10:46:39 GMT, Jatin Bhateja wrote: > This bug fix patch addressed an assertion failure due to unexpected register operand encoding. > AVX2 flavour of instruction "vpcmpeqq" expects to operate over XMM registers from lower register bank (0-15), in this case, the register mask associated with the destination vector operand of the matcher pattern also includes registers from the higher bank. > > The issue can be reliably reproduced if we modify the static allocation order of XMM register through AD file change. > Existing bug [JDK-8343294](https://bugs.openjdk.org/browse/JDK-8343294) already tracks the requirement to randomize the allocation ordering. > > Kindly review and share your feedback. > > Best Regards, > Jatin Looks fine to me too. Let me test it before approval. ------------- PR Review: https://git.openjdk.org/jdk/pull/23979#pullrequestreview-2675423585 From kvn at openjdk.org Tue Mar 11 17:24:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 11 Mar 2025 17:24:56 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C [v4] In-Reply-To: References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: On Tue, 11 Mar 2025 10:56:18 GMT, David Linus Briemann wrote: >> Local benchmarks show good improvements for the crc32c intrinsification: >> >> >> without intrinsic (master): >> >> >> $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 >> offset = 0 >> msgSize = 5120 bytes >> iters = 300000 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(byte[]) runtime = 1.186507782 seconds >> CRC32C.update(byte[]) throughput = 1294.5553525244388 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(ByteBuffer) runtime = 1.355515648 seconds >> CRC32C.update(ByteBuffer) throughput = 1133.1481139788364 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> >> >> with intrinsic: >> >> >> $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 >> offset = 0 >> msgSize = 5120 bytes >> iters = 300000 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(byte[]) runtime = 0.065003188 seconds >> CRC32C.update(byte[]) throughput = 23629.610289267657 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(ByteBuffer) runtime = 0.072310133 seconds >> CRC32C.update(ByteBuffer) throughput = 21241.836189127185 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- > > David Linus Briemann has updated the pull request incrementally with two additional commits since the last revision: > > - fix > - address review comments Looks good. Let me test it before approval ------------- PR Review: https://git.openjdk.org/jdk/pull/23826#pullrequestreview-2675496032 From duke at openjdk.org Tue Mar 11 19:20:51 2025 From: duke at openjdk.org (Anjian Wen) Date: Tue, 11 Mar 2025 19:20:51 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory Message-ID: >From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about 15%-20% unsafe setmemory time ------------- Commit messages: - RISC-V: Intrinsify Unsafe::setMemory - RISC-V: Intrinsify Unsafe::setMemory Changes: https://git.openjdk.org/jdk/pull/23890/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351140 Stats: 49 lines in 1 file changed: 49 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From duke at openjdk.org Tue Mar 11 19:20:51 2025 From: duke at openjdk.org (Anjian Wen) Date: Tue, 11 Mar 2025 19:20:51 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:46:53 GMT, Anjian Wen wrote: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about 15%-20% unsafe setmemory time // Add benchmark Test import sun.misc.Unsafe; import java.lang.reflect.Field; public class UnsafeMemoryTest { private static Unsafe getUnsafe() throws Exception { Field f = Unsafe.class.getDeclaredField("theUnsafe"); f.setAccessible(true); return (Unsafe) f.get(null); } public static void main(String[] args) { try { Unsafe unsafe = getUnsafe(); long size = 9999L; long address = unsafe.allocateMemory(size); byte initialValue = 0x4E; long totalElapsedTime = 0; long start = System.nanoTime(); for (int i = 0; i < 100000000; i++) unsafe.setMemory(address, size, initialValue); long end = System.nanoTime(); totalElapsedTime = end - start; System.out.println("elapsed time: " + totalElapsedTime + " ns"); } catch (Exception e) { e.printStackTrace(); } } } This test seems can reduce about 15% time consumption I have run the test above on my riscv musebook and here is my result Origin 91142737523 ns 81256485532 ns 81935426870 ns 77492514654 ns 81042094789 ns 85066289810 ns After optimize 73503057373 ns 75953174376 ns 75200200198 ns 73717004138 ns 72682417915 ns 73935123648 ns The result shows that 1?origin data seems to fluctuate greatly and optimized version seems more stable 2?The average optimization of the optimized data is 11%, the maximum optimization is 20%, and the minimum optimization is 3%. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2700047363 PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2703215881 From duke at openjdk.org Tue Mar 11 19:23:10 2025 From: duke at openjdk.org (Anjian Wen) Date: Tue, 11 Mar 2025 19:23:10 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= In-Reply-To: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: On Fri, 7 Feb 2025 06:52:13 GMT, Anjian Wen wrote: > Add RISCV zfa extension fminm/fmaxm > This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max > Hi @Anjian-Wen, welcome to this OpenJDK project and thanks for contributing! > > We do not recognize you as [Contributor](https://openjdk.java.net/bylaws#contributor) and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow [the instructions](https://oca.opensource.oracle.com/). Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing `/signed` in a comment in this pull request. > > If you already are an OpenJDK [Author](https://openjdk.java.net/bylaws#author), [Committer](https://openjdk.java.net/bylaws#committer) or [Reviewer](https://openjdk.java.net/bylaws#reviewer), please click [here](https://bugs.openjdk.java.net/secure/CreateIssue.jspa?pid=11300&issuetype=1) to open a new issue so that we can record that fact. Please use "Add GitHub user Anjian-Wen" as summary for the issue. > > If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing `/covered` in a comment in this pull request. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23509#issuecomment-2664522249 From duke at openjdk.org Tue Mar 11 19:23:10 2025 From: duke at openjdk.org (Anjian Wen) Date: Tue, 11 Mar 2025 19:23:10 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= Message-ID: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Add RISCV zfa extension fminm/fmaxm This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max ------------- Commit messages: - 8349632: RISC-V: Add Zfa fminm/fmaxm - JDK-8349632: RISC-V: Add Zfa fminm/fmaxm - 8349632: RISC-V: Add Zfa fminm/fmaxm - JDK-8349632: RISCV: Add Zfa fminm/fmaxm - 8349632:RISC-V: Add Zfa fminm/fmaxm Changes: https://git.openjdk.org/jdk/pull/23509/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23509&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349632 Stats: 80 lines in 2 files changed: 80 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23509/head:pull/23509 PR: https://git.openjdk.org/jdk/pull/23509 From duke at openjdk.org Tue Mar 11 19:23:10 2025 From: duke at openjdk.org (Anjian Wen) Date: Tue, 11 Mar 2025 19:23:10 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= In-Reply-To: References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: On Tue, 18 Feb 2025 03:27:06 GMT, Anjian Wen wrote: > Hi @Anjian-Wen, welcome to this OpenJDK project and thanks for contributing! > > We do not recognize you as [Contributor](https://openjdk.java.net/bylaws#contributor) and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow [the instructions](https://oca.opensource.oracle.com/). Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing `/signed` in a comment in this pull request. > > If you already are an OpenJDK [Author](https://openjdk.java.net/bylaws#author), [Committer](https://openjdk.java.net/bylaws#committer) or [Reviewer](https://openjdk.java.net/bylaws#reviewer), please click [here](https://bugs.openjdk.java.net/secure/CreateIssue.jspa?pid=11300&issuetype=1) to open a new issue so that we can record that fact. Please use "Add GitHub user Anjian-Wen" as summary for the issue. > > If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing `/covered` in a comment in this pull request. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23509#issuecomment-2693202499 From duke at openjdk.org Tue Mar 11 19:23:11 2025 From: duke at openjdk.org (Anjian Wen) Date: Tue, 11 Mar 2025 19:23:11 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= In-Reply-To: References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: On Fri, 21 Feb 2025 04:03:11 GMT, Fei Yang wrote: >> Add RISCV zfa extension fminm/fmaxm >> This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2161: > >> 2159: void C2_MacroAssembler::minmmaxm_fp(FloatRegister dst, FloatRegister src1, FloatRegister src2, >> 2160: bool is_double, bool is_min) { >> 2161: assert_different_registers(dst, src1, src2); > > Do FMINM.S and FMAXM.S have a constraint on the registers? Thanks for replying. >From Zfa doc https://github.com/riscv/riscv-isa-manual/blob/main/src/zfa.adoc, FMINM.S/FMAXM.S look defined like the FMIN.S and FMAX.S instructions. And from the ISA manual, I have not found any constraints on the registers. it only needs normal floating point registers ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23509#discussion_r1964905518 From fyang at openjdk.org Tue Mar 11 19:23:11 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 11 Mar 2025 19:23:11 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= In-Reply-To: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: On Fri, 7 Feb 2025 06:52:13 GMT, Anjian Wen wrote: > Add RISCV zfa extension fminm/fmaxm > This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2159: > 2157: } > 2158: > 2159: void C2_MacroAssembler::minmmaxm_fp(FloatRegister dst, FloatRegister src1, FloatRegister src2, There is no need to have this macro-assembler routine. You can call `fminm_s/d` / `fmaxm_s/d` directly on the callsites. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2161: > 2159: void C2_MacroAssembler::minmmaxm_fp(FloatRegister dst, FloatRegister src1, FloatRegister src2, > 2160: bool is_double, bool is_min) { > 2161: assert_different_registers(dst, src1, src2); Do FMINM.S and FMAXM.S have a constraint on the registers? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23509#discussion_r1966377744 PR Review Comment: https://git.openjdk.org/jdk/pull/23509#discussion_r1964799700 From kxu at openjdk.org Tue Mar 11 19:49:41 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 11 Mar 2025 19:49:41 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: add micro benchmark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/5b972e9a..851bfb2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=05-06 Stats: 202 lines in 1 file changed: 202 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From kxu at openjdk.org Tue Mar 11 19:54:54 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 11 Mar 2025 19:54:54 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 19:49:41 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > add micro benchmark I did some micro benchmark. The result is expected: unoptimized baseline scales linearly as the # of terms, while the optimized remains constant. @eme64 Please see [`test/micro/org/openjdk/bench/vm/compiler/SerialAdditions.java`](https://github.com/openjdk/jdk/pull/23506/files#:~:text=test/micro/org/openjdk/bench/vm/compiler/SerialAdditions.java) for details. This is my first time using the JMH framework, please let me know if I made any mistake. Thanks! --- ![](https://github.com/user-attachments/assets/f24c4443-16cc-4efc-9e41-6888f75114a8) **baseline**: $ CONF=linux-x86_64-server-release make test TEST=micro:org.openjdk.bench.vm.compiler.SerialAdditions Benchmark Mode Cnt Score Error Units SerialAdditions.addIntsMixed avgt 12 8.231 ? 0.767 ns/op SerialAdditions.addIntsTo02 avgt 12 0.516 ? 0.074 ns/op SerialAdditions.addIntsTo04 avgt 12 0.496 ? 0.033 ns/op SerialAdditions.addIntsTo05 avgt 12 0.492 ? 0.031 ns/op SerialAdditions.addIntsTo06 avgt 12 0.571 ? 0.029 ns/op SerialAdditions.addIntsTo08 avgt 12 0.711 ? 0.038 ns/op SerialAdditions.addIntsTo16 avgt 12 1.245 ? 0.049 ns/op SerialAdditions.addIntsTo23 avgt 12 1.793 ? 0.054 ns/op SerialAdditions.addIntsTo32 avgt 12 2.635 ? 0.033 ns/op SerialAdditions.addIntsTo42 avgt 12 3.726 ? 0.092 ns/op SerialAdditions.addIntsTo64 avgt 12 6.468 ? 0.215 ns/op SerialAdditions.addLongsMixed avgt 12 6.462 ? 0.202 ns/op SerialAdditions.addLongsTo02 avgt 12 0.467 ? 0.014 ns/op SerialAdditions.addLongsTo04 avgt 12 0.466 ? 0.022 ns/op SerialAdditions.addLongsTo05 avgt 12 0.495 ? 0.024 ns/op SerialAdditions.addLongsTo06 avgt 12 0.553 ? 0.023 ns/op SerialAdditions.addLongsTo08 avgt 12 0.688 ? 0.032 ns/op SerialAdditions.addLongsTo16 avgt 12 1.201 ? 0.034 ns/op SerialAdditions.addLongsTo23 avgt 12 1.736 ? 0.053 ns/op SerialAdditions.addLongsTo32 avgt 12 2.679 ? 0.088 ns/op SerialAdditions.addLongsTo42 avgt 12 3.742 ? 0.081 ns/op SerialAdditions.addLongsTo64 avgt 12 6.359 ? 0.056 ns/op **with this patch**: $ CONF=linux-x86_64-server-release make test TEST=micro:org.openjdk.bench.vm.compiler.SerialAdditions TEST_VM_OPTS="-XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode" Benchmark Mode Cnt Score Error Units SerialAdditions.addIntsMixed avgt 12 0.933 ? 0.045 ns/op SerialAdditions.addIntsTo02 avgt 12 0.459 ? 0.015 ns/op SerialAdditions.addIntsTo04 avgt 12 0.468 ? 0.018 ns/op SerialAdditions.addIntsTo05 avgt 12 0.467 ? 0.022 ns/op SerialAdditions.addIntsTo06 avgt 12 0.455 ? 0.020 ns/op SerialAdditions.addIntsTo08 avgt 12 0.475 ? 0.027 ns/op SerialAdditions.addIntsTo16 avgt 12 0.470 ? 0.018 ns/op SerialAdditions.addIntsTo23 avgt 12 0.469 ? 0.028 ns/op SerialAdditions.addIntsTo32 avgt 12 0.474 ? 0.017 ns/op SerialAdditions.addIntsTo42 avgt 12 0.476 ? 0.012 ns/op SerialAdditions.addIntsTo64 avgt 12 0.480 ? 0.017 ns/op SerialAdditions.addLongsMixed avgt 12 1.051 ? 0.040 ns/op SerialAdditions.addLongsTo02 avgt 12 0.460 ? 0.027 ns/op SerialAdditions.addLongsTo04 avgt 12 0.473 ? 0.028 ns/op SerialAdditions.addLongsTo05 avgt 12 0.463 ? 0.024 ns/op SerialAdditions.addLongsTo06 avgt 12 0.457 ? 0.022 ns/op SerialAdditions.addLongsTo08 avgt 12 0.523 ? 0.074 ns/op SerialAdditions.addLongsTo16 avgt 12 0.460 ? 0.013 ns/op SerialAdditions.addLongsTo23 avgt 12 0.468 ? 0.018 ns/op SerialAdditions.addLongsTo32 avgt 12 0.523 ? 0.078 ns/op SerialAdditions.addLongsTo42 avgt 12 0.470 ? 0.019 ns/op SerialAdditions.addLongsTo64 avgt 12 0.475 ? 0.032 ns/op --- ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2715539309 From sparasa at openjdk.org Tue Mar 11 22:38:14 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 11 Mar 2025 22:38:14 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v8] In-Reply-To: References: Message-ID: > The goal of this PR is to generate code using APX NDD instructions. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove unused expand blocks;ndd version of orL_rReg_castP2X ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/630acb8f..e02ff23c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=06-07 Stats: 43 lines in 1 file changed: 24 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From dlong at openjdk.org Tue Mar 11 22:58:57 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Mar 2025 22:58:57 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> Message-ID: On Tue, 11 Mar 2025 09:03:45 GMT, Matthias Baesken wrote: >> src/hotspot/share/opto/block.cpp line 1617: >> >>> 1615: float f_from_pct = (100 * freq) / b->_freq; >>> 1616: float f_to_pct = (100 * freq) / target->_freq; >>> 1617: int from_pct = (f_from_pct < (float)INT_MAX) ? (int)f_from_pct : INT_MAX; >> >> I think (float)INT_MAX is problematic. Due to rounding, isn't the result actually greater than INT_MAX? >> Does it even make sense to have a "pct" that is greater than 100 here? >> Do we want `int from_pct = MIN2((double)INT_MAX, (double)f_from_pct);` or maybe >> `int from_pct = MIN2((100.0, (double)f_from_pct);`? > >> Adding to @dean-long's questions, I was wondering how we can get to a 9.97582e+36 value (since it is a runtime ubsan issue): is this a result of successive rounding-ups or there is perhaps an upstream issue? > > That's a good question. I could add a bit of tracing or asserts to find out more about this. > It seems this is an aarch64 related thing because on (Linux) x86_64/ppc64le I never observed this. > Do you think those high values are not expected ? Also, to compute `from_pct`, we end up multiplying and then dividing by the same value `b->_freq`, which cancel out and simplify to `100 * b->succ_prob(j)`. Furthermore, succ_prob() should always return a value between 0.0 and 1.0, so the real problem is probably only `to_pct` and very small values of `target->_freq`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1990235113 From sparasa at openjdk.org Tue Mar 11 23:41:07 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 11 Mar 2025 23:41:07 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v9] In-Reply-To: References: Message-ID: <__Zg3Qut6b9fjhTaPEyAT_sxAVzKsJwPq7xIeXmzg_g=.6d11441f-cbb8-4979-8762-55fea8c2997a@github.com> > The goal of this PR is to generate code using APX NDD instructions. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update copyright; remove extra lines ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/e02ff23c..cb21e92a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=07-08 Stats: 16 lines in 1 file changed: 0 ins; 15 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From haosun at openjdk.org Wed Mar 12 02:18:11 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 12 Mar 2025 02:18:11 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 01:18:57 GMT, Xiaohong Gong wrote: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... LGTM ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/23790#pullrequestreview-2676660094 From fyang at openjdk.org Wed Mar 12 02:29:58 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Mar 2025 02:29:58 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= In-Reply-To: References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: On Mon, 3 Mar 2025 03:38:30 GMT, Anjian Wen wrote: >>> Hi @Anjian-Wen, welcome to this OpenJDK project and thanks for contributing! >>> >>> We do not recognize you as [Contributor](https://openjdk.java.net/bylaws#contributor) and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow [the instructions](https://oca.opensource.oracle.com/). Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing `/signed` in a comment in this pull request. >>> >>> If you already are an OpenJDK [Author](https://openjdk.java.net/bylaws#author), [Committer](https://openjdk.java.net/bylaws#committer) or [Reviewer](https://openjdk.java.net/bylaws#reviewer), please click [here](https://bugs.openjdk.java.net/secure/CreateIssue.jspa?pid=11300&issuetype=1) to open a new issue so that we can record that fact. Please use "Add GitHub user Anjian-Wen" as summary for the issue. >>> >>> If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing `/covered` in a comment in this pull request. > >> Hi @Anjian-Wen, welcome to this OpenJDK project and thanks for contributing! >> >> We do not recognize you as [Contributor](https://openjdk.java.net/bylaws#contributor) and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow [the instructions](https://oca.opensource.oracle.com/). Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing `/signed` in a comment in this pull request. >> >> If you already are an OpenJDK [Author](https://openjdk.java.net/bylaws#author), [Committer](https://openjdk.java.net/bylaws#committer) or [Reviewer](https://openjdk.java.net/bylaws#reviewer), please click [here](https://bugs.openjdk.java.net/secure/CreateIssue.jspa?pid=11300&issuetype=1) to open a new issue so that we can record that fact. Please use "Add GitHub user Anjian-Wen" as summary for the issue. >> >> If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing `/covered` in a comment in this pull request. @Anjian-Wen : Do you have github actions enabled? Usually there are about 16 checks, but you only have 2 check passed ------------- PR Comment: https://git.openjdk.org/jdk/pull/23509#issuecomment-2716229995 From duke at openjdk.org Wed Mar 12 02:38:19 2025 From: duke at openjdk.org (Anjian Wen) Date: Wed, 12 Mar 2025 02:38:19 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= [v2] In-Reply-To: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: > Add RISCV zfa extension fminm/fmaxm > This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'openjdk:master' into Zfa_dev_branch - 8349632: RISC-V: Add Zfa fminm/fmaxm Change macro-assembler routine to directly call in riscv.ad - JDK-8349632: RISC-V: Add Zfa fminm/fmaxm add zfa predicate - 8349632: RISC-V: Add Zfa fminm/fmaxm delete assert in new add macroAssembly but not the old - JDK-8349632: RISCV: Add Zfa fminm/fmaxm delete assert and change fminm/fmaxm to new match rule - 8349632:RISC-V: Add Zfa fminm/fmaxm ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23509/files - new: https://git.openjdk.org/jdk/pull/23509/files/b763ee4d..927d0244 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23509&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23509&range=00-01 Stats: 102279 lines in 2523 files changed: 51917 ins; 33646 del; 16716 mod Patch: https://git.openjdk.org/jdk/pull/23509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23509/head:pull/23509 PR: https://git.openjdk.org/jdk/pull/23509 From duke at openjdk.org Wed Mar 12 02:41:52 2025 From: duke at openjdk.org (Anjian Wen) Date: Wed, 12 Mar 2025 02:41:52 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= In-Reply-To: References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: On Mon, 3 Mar 2025 03:38:30 GMT, Anjian Wen wrote: >>> Hi @Anjian-Wen, welcome to this OpenJDK project and thanks for contributing! >>> >>> We do not recognize you as [Contributor](https://openjdk.java.net/bylaws#contributor) and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow [the instructions](https://oca.opensource.oracle.com/). Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing `/signed` in a comment in this pull request. >>> >>> If you already are an OpenJDK [Author](https://openjdk.java.net/bylaws#author), [Committer](https://openjdk.java.net/bylaws#committer) or [Reviewer](https://openjdk.java.net/bylaws#reviewer), please click [here](https://bugs.openjdk.java.net/secure/CreateIssue.jspa?pid=11300&issuetype=1) to open a new issue so that we can record that fact. Please use "Add GitHub user Anjian-Wen" as summary for the issue. >>> >>> If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing `/covered` in a comment in this pull request. > >> Hi @Anjian-Wen, welcome to this OpenJDK project and thanks for contributing! >> >> We do not recognize you as [Contributor](https://openjdk.java.net/bylaws#contributor) and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow [the instructions](https://oca.opensource.oracle.com/). Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing `/signed` in a comment in this pull request. >> >> If you already are an OpenJDK [Author](https://openjdk.java.net/bylaws#author), [Committer](https://openjdk.java.net/bylaws#committer) or [Reviewer](https://openjdk.java.net/bylaws#reviewer), please click [here](https://bugs.openjdk.java.net/secure/CreateIssue.jspa?pid=11300&issuetype=1) to open a new issue so that we can record that fact. Please use "Add GitHub user Anjian-Wen" as summary for the issue. >> >> If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing `/covered` in a comment in this pull request. > @Anjian-Wen : Do you have github actions enabled? Usually there are about 16 checks, but you only have 2 check passed I have choose ?Allow all actions and reusable workflows? in my repository setting but it still remains 1 test?is there any other option I should choose? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23509#issuecomment-2716248801 From fyang at openjdk.org Wed Mar 12 03:29:57 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Mar 2025 03:29:57 GMT Subject: RFR: 8351662: [Test] RISC-V: enable bunch of IR test In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 14:08:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > There are bunch of IR test not enabled on riscv, it's good to enable them, because enabling them will help to: > 1. cover the test gap > 2. find out potential missing intrinsics. > > This patch also changes cpu features from `vm.opt.UseXxx` to `xxx`, as it's easier to find out these tests in the future. > > NOTE: There are still some other test should be enabled, but currently they fail when simply enable them, they will be further investigated, later could be enabled in other PRs with additional code change/implementation if it's feasible. > > Thanks! LGTM. Thanks for doing this. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23985#pullrequestreview-2676776640 From jbhateja at openjdk.org Wed Mar 12 03:34:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Mar 2025 03:34:54 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: <7EI9O5thx8AyGis3mApSdjQv9ral_gTncOaybCykXuA=.672a0c2f-17a3-47af-a53b-66c7456b05eb@github.com> References: <7EI9O5thx8AyGis3mApSdjQv9ral_gTncOaybCykXuA=.672a0c2f-17a3-47af-a53b-66c7456b05eb@github.com> Message-ID: On Mon, 10 Mar 2025 21:32:04 GMT, Sandhya Viswanathan wrote: >> test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 71: >> >>> 69: } >>> 70: >>> 71: @Test >> >> Suggestion: >> >> @Test >> @IR(counts = { IRNode.VECTOR_CAST_HF2F, " >0 " }, applyIfCPUFeatureOr = { "avx512vl", "true", "f16c", "true" }) > > Used the IR test from compiler/vectorization/TestFloatConversionsVector.java to include other architectures as well. Thanks @sviswa7 , these operations are also supported on PPC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1990492821 From jbhateja at openjdk.org Wed Mar 12 03:34:55 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Mar 2025 03:34:55 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 07:50:13 GMT, Jatin Bhateja wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 80: > >> 78: } >> 79: >> 80: @Test > > Suggestion: > > /* > * C2 handles i2s conversion by constraining the value range of the integral argument; thus > * argument fed to ConvHF2F is of type T_INT. Fix for JDK-8350835 skips over vectorizing such a case > * for now. > */ > @Test > @IR(failOn = { IRNode.VECTOR_CAST_HF2F }, applyIfCPUFeatureOr = { "avx512vl", "true", "f16c", "true" }) I don't see any harm in including the above suggested comment as you mentioned we plan to support these auto vrctoriizations in future ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1990493712 From duke at openjdk.org Wed Mar 12 03:38:25 2025 From: duke at openjdk.org (Anjian Wen) Date: Wed, 12 Mar 2025 03:38:25 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= [v3] In-Reply-To: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: <9Mzh5t5Pr-aMFu-zpFgMd83O8Svzh6Wah1GqgU7Dt80=.cc9b25d9-0f89-4a60-a79c-698d1be0a52f@github.com> > Add RISCV zfa extension fminm/fmaxm > This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: add temp commit for test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23509/files - new: https://git.openjdk.org/jdk/pull/23509/files/927d0244..faa90708 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23509&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23509&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23509/head:pull/23509 PR: https://git.openjdk.org/jdk/pull/23509 From jbhateja at openjdk.org Wed Mar 12 03:40:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Mar 2025 03:40:57 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 21:26:41 GMT, Sandhya Viswanathan wrote: >> Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. >> >> Review comments are welcome. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > review comments test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 78: > 76: @IR(counts = {IRNode.VECTOR_CAST_HF2F, "> 0"}, > 77: applyIfOr = {"UseCompactObjectHeaders", "false", "AlignVector", "false"}, > 78: applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}, Can you kindly justify the need for compressed object header usage, it will mainly impact the pre-loop trip count compuation. AlignVector should be sufficient since it's a whitelisted option ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1990497448 From duke at openjdk.org Wed Mar 12 03:45:36 2025 From: duke at openjdk.org (Anjian Wen) Date: Wed, 12 Mar 2025 03:45:36 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= [v4] In-Reply-To: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: > Add RISCV zfa extension fminm/fmaxm > This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: delete useless comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23509/files - new: https://git.openjdk.org/jdk/pull/23509/files/faa90708..249bcd4d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23509&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23509&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23509/head:pull/23509 PR: https://git.openjdk.org/jdk/pull/23509 From fyang at openjdk.org Wed Mar 12 04:10:53 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Mar 2025 04:10:53 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:46:53 GMT, Anjian Wen wrote: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about 15%-20% unsafe setmemory time You might want to try this JMH test: `make test TEST="micro:java.lang.foreign.MemorySegmentZeroUnsafe"` ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2716380194 From duke at openjdk.org Wed Mar 12 07:49:59 2025 From: duke at openjdk.org (Anjian Wen) Date: Wed, 12 Mar 2025 07:49:59 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= [v4] In-Reply-To: References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: On Wed, 12 Mar 2025 03:45:36 GMT, Anjian Wen wrote: >> Add RISCV zfa extension fminm/fmaxm >> This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > delete useless comment This patch passed gtests, and also passed Related test -- test/jdk/java/lang/Math/MinMax.java in qemu with -XX:+UseZfa ------------- PR Comment: https://git.openjdk.org/jdk/pull/23509#issuecomment-2716933559 From dfenacci at openjdk.org Wed Mar 12 07:57:52 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 12 Mar 2025 07:57:52 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> Message-ID: On Tue, 11 Mar 2025 22:55:42 GMT, Dean Long wrote: >>> Adding to @dean-long's questions, I was wondering how we can get to a 9.97582e+36 value (since it is a runtime ubsan issue): is this a result of successive rounding-ups or there is perhaps an upstream issue? >> >> That's a good question. I could add a bit of tracing or asserts to find out more about this. >> It seems this is an aarch64 related thing because on (Linux) x86_64/ppc64le I never observed this. >> Do you think those high values are not expected ? > > Also, to compute `from_pct`, we end up multiplying and then dividing by the same value `b->_freq`, which cancel out and simplify to `100 * b->succ_prob(j)`. Furthermore, succ_prob() should always return a value between 0.0 and 1.0, so the real problem is probably only `to_pct` and very small values of `target->_freq`. > Do you think those high values are not expected ? Sorry, my mistake. As @dean-long pointed out they are to be expected with very small values of `target->_freq` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1990884638 From duke at openjdk.org Wed Mar 12 08:03:57 2025 From: duke at openjdk.org (Anjian Wen) Date: Wed, 12 Mar 2025 08:03:57 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: <-VE5TS_L_X4JBQYQK3x2nXTDvBaWRKGPoPzi8VQpZWU=.4e331d21-b54b-48fc-9d7a-f65482caf435@github.com> On Wed, 12 Mar 2025 04:08:37 GMT, Fei Yang wrote: > You might want to try this JMH test: `make test TEST="micro:java.lang.foreign.MemorySegmentZeroUnsafe"` Thanks for remaining. I will try to test on my riscv musebook ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2716975911 From haosun at openjdk.org Wed Mar 12 08:07:02 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 12 Mar 2025 08:07:02 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 03:00:39 GMT, Xiaohong Gong wrote: >> Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures. >> >> The performance of Vector API relative JMH micro benchmarks can improve about 70x ~ 95x on a NVIDIA Grace CPU, which is a 128-bit vector length sve2 architecture, with different UseSVE options. Here is the gain details: >> >> >> Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 >> ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x >> ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x >> ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x >> ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x >> ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x >> ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x >> ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x >> ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x >> ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x >> ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x >> ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x >> ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x >> IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x >> IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x >> IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x >> IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x >> IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x >> IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x >> IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x >> IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x >> IntMaxVector.UMAX 1024 thrpt 30 82.40x 81.85x 82.242x >> IntMaxVector.UMAXMasked 1024 thrpt 30 90.30x 92.10x 92.587x >> IntMaxVector.UMIN 1024 thrpt 30 82.84x 81.43x 82.801x >> IntMaxVector.UMINMasked 1024 thrpt 30 90.43x 91.49x 92.678x >> LongMaxVector.SADD 102... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'jdk:master' into JDK_8349522 > - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations > > Since PR [1] has added several new vector operations in VectorAPI > and the X86 backend implementation for them, this patch adds the > AArch64 backend part for NEON/SVE architectures. > > The performance of Vector API relative jmh micro benchmarks can > improve about 70x ~ 95x on an AArch64 128-bit vector length sve2 > architecture with different UseSVE options. Here is the uplift > details: > > ``` > Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 > ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x > ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x > ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x > ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x > ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x > ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x > ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x > ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x > ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x > ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x > ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x > ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x > IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x > IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x > IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x > IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x > IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x > IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x > IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x > IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x > IntMaxVector.UMAX 1024 thrpt 30 8... LGTM ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/23608#pullrequestreview-2677471995 From epeter at openjdk.org Wed Mar 12 08:45:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Mar 2025 08:45:58 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value In-Reply-To: References: Message-ID: <5fEVX0zAsdNd9v3Rk6Gr4lIXTc96g2LndUhX4Qb-bgc=.e4553c72-8da4-41c7-b71f-628bbeea14be@github.com> On Wed, 12 Mar 2025 08:01:15 GMT, Emanuel Peter wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > src/hotspot/share/opto/intrinsicnode.cpp line 265: > >> 263: if (!mask_type->is_con()) { >> 264: if ( opc == Op_CompressBits) { >> 265: int mask_max_bw; > > Suggestion: > > // Pattern: Integer/Long.compress(src_type, mask_type) > int mask_max_bw; Can you also say what the meaning of `mask_max_bw` is? Possibly a more expressive name would help here too. > src/hotspot/share/opto/intrinsicnode.cpp line 283: > >> 281: clz = bt == T_INT ? clz - 32 : clz; >> 282: mask_max_bw = max_bw - clz; >> 283: } > > Can you please put the comments for cases 1-3 either consistently before the condition, or after the condition with inlining? I would vote for inside each condition with indentation, so just like case 3), except 2 spaces indented ;) Why not start with the "nice" case 3) first, where we know that the range is positive, and so even after compression we cannot get negative values? What does this mean `only includes +ve values`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990900245 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990907216 From epeter at openjdk.org Wed Mar 12 08:45:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Mar 2025 08:45:57 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 17:37:36 GMT, Jatin Bhateja wrote: > Hi All, > > This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. > > Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. > > New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. > > Kindly review and share your feedback. > > Best Regards, > Jatin @jatin-bhateja Thanks for looking into this! I left a first set of comments :) Primarily, it is about these issues: - We need good comments, preferably even proofs. Because we got things wrong the last time, and there were no comments/proofs. It's difficult to get this sort of arithmetic transformation right, and it is hard to review. Proofs help to think through all the steps carefully. - Test coverage: I would like to see some more randomized cases of input ranges. src/hotspot/share/opto/intrinsicnode.cpp line 265: > 263: if (!mask_type->is_con()) { > 264: if ( opc == Op_CompressBits) { > 265: int mask_max_bw; Suggestion: // Pattern: Integer/Long.compress(src_type, mask_type) int mask_max_bw; src/hotspot/share/opto/intrinsicnode.cpp line 266: > 264: if ( opc == Op_CompressBits) { > 265: int mask_max_bw; > 266: int max_bw = bt == T_INT ? 32 : 64; Should there be an assert somewhere that `bt` is either `T_INT` or `T_LONG`? src/hotspot/share/opto/intrinsicnode.cpp line 270: > 268: // strictly non-negative result value range. > 269: if ((mask_type->lo_as_long() < 0L && mask_type->hi_as_long() >= -1L)) { > 270: mask_max_bw = max_bw; This sounds like it should be the `else` case, where we can prove nothing special. I would put it last. src/hotspot/share/opto/intrinsicnode.cpp line 275: > 273: // a +ve value range. > 274: } else if (mask_type->hi_as_long() < -1L) { > 275: mask_max_bw = max_bw - 1; I would say something more explicit, like this: Case 2) The mask range does not include -1, which is the only case where all bits are set in the mask. Hence, at least one bit is not set in the mask, and so after compression the most significant bit, i.e. the sign bit is zero, and the compression result must thus be non-negative. src/hotspot/share/opto/intrinsicnode.cpp line 278: > 276: } else { > 277: // Case 3) Mask value range only includes +ve values, this can again be > 278: // used to ascertain known Zero bits of resultant value. I would put this case as the first, swapping it with Case 1). And I would say something more explicit like this: `Case 3) The mask value range is non-negative. Hence, the mask has at least one zero bit.` src/hotspot/share/opto/intrinsicnode.cpp line 280: > 278: // used to ascertain known Zero bits of resultant value. > 279: assert(mask_type->lo_as_long() >= 0, ""); > 280: jlong clz = count_leading_zeros(mask_type->hi_as_long()); Suggestion: jlong clz = count_leading_zeros(mask_type->hi_as_long()); // The mask has at least clz leading zeros, and hence also the compression // result must have at least clz leading zeros. src/hotspot/share/opto/intrinsicnode.cpp line 283: > 281: clz = bt == T_INT ? clz - 32 : clz; > 282: mask_max_bw = max_bw - clz; > 283: } Can you please put the comments for cases 1-3 either consistently before the condition, or after the condition with inlining? I would vote for inside each condition with indentation, so just like case 3), except 2 spaces indented ;) src/hotspot/share/opto/intrinsicnode.cpp line 285: > 283: } > 284: > 285: lo = mask_max_bw == max_bw ? lo : 0L; Suggestion: // If we have found that we have at least one zero bit in the mask, and hence at least // one leading zero in the compression result, we know that the compression result // is non-negative, and we can update lo accordingly. assert(lo <= 0L, "we are only narrowing the type"); lo = mask_max_bw == max_bw ? lo : 0L; Not sure if the assert is ok, but I think so. src/hotspot/share/opto/intrinsicnode.cpp line 289: > 287: // result value range is primarily dependent on true count > 288: // of participating mask value. Thus bit compression can never > 289: // result into a value greater than original value. To me this is not clear "inherently" ? Maybe there could be a quick proof like this: 1) src < 0, result < 0: only if mask == -1, and so src == result. 2) src < 0, result >= 0: src < result. 3) src >= 0, ... src/hotspot/share/opto/intrinsicnode.cpp line 294: > 292: // input equals lower bound of mask value range. > 293: hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); > 294: hi = mask_max_bw < max_bw ? (1L << mask_max_bw) - 1 : hi; I need some more explanation / correctness proof here. I find it difficult to immediately see all possible cases ? test/hotspot/jtreg/compiler/c2/TestBitCompressValueTransform.java line 26: > 24: /* > 25: * @test > 26: * @key stress randomness Suggestion: I don't see any randomness or stressing in this test, correct? test/hotspot/jtreg/compiler/c2/TestBitCompressValueTransform.java line 27: > 25: * @test > 26: * @key stress randomness > 27: * @requires vm.compiler2.enabled & os.simpleArch == "x64" Can you please remove this restriction, so we can also run this test with other compilers and platforms? You can always restrict the IR rules instead ;) test/hotspot/jtreg/compiler/c2/TestBitCompressValueTransform.java line 99: > 97: @IR (counts = { IRNode.COMPRESS_BITS, " 0 "} , failOn = { IRNode.UNSTABLE_IF_TRAP }, applyIfCPUFeature = { "bmi2", "true" }) > 98: public long test4(long value) { > 99: long filter_bits = value & 0xF; It would be nice if we had some test cases with random ranges. So at least have something that puts in a random mask value. But alternatively also something that could span arbitrary ranges, maybe using `min/max` for clamping. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23947#pullrequestreview-2677433845 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990898233 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990890991 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990909653 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990916737 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990923240 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990924402 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990895378 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990937109 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990947893 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990950676 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990954143 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990881804 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990958427 From epeter at openjdk.org Wed Mar 12 08:45:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Mar 2025 08:45:58 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value In-Reply-To: <5fEVX0zAsdNd9v3Rk6Gr4lIXTc96g2LndUhX4Qb-bgc=.e4553c72-8da4-41c7-b71f-628bbeea14be@github.com> References: <5fEVX0zAsdNd9v3Rk6Gr4lIXTc96g2LndUhX4Qb-bgc=.e4553c72-8da4-41c7-b71f-628bbeea14be@github.com> Message-ID: On Wed, 12 Mar 2025 08:02:52 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/intrinsicnode.cpp line 265: >> >>> 263: if (!mask_type->is_con()) { >>> 264: if ( opc == Op_CompressBits) { >>> 265: int mask_max_bw; >> >> Suggestion: >> >> // Pattern: Integer/Long.compress(src_type, mask_type) >> int mask_max_bw; > > Can you also say what the meaning of `mask_max_bw` is? Possibly a more expressive name would help here too. If I'm right here, you want to say how many many of the low significance bits are unknown, and that the rest, the high significance bits are all zero, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990908641 From epeter at openjdk.org Wed Mar 12 08:45:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Mar 2025 08:45:58 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value In-Reply-To: References: <5fEVX0zAsdNd9v3Rk6Gr4lIXTc96g2LndUhX4Qb-bgc=.e4553c72-8da4-41c7-b71f-628bbeea14be@github.com> Message-ID: On Wed, 12 Mar 2025 08:21:21 GMT, Emanuel Peter wrote: >> If I'm right here, you want to say how many many of the low significance bits are unknown, and that the rest, the high significance bits are all zero, right? > > You could say that we are trying to find case where we know that the compression result will have at least a certain number of leading zeros, which allows us to restrict the type. It may also be more understandable if you used `min_leading_zeros` instead of `mask_max_bw`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990934431 From epeter at openjdk.org Wed Mar 12 08:45:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Mar 2025 08:45:58 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value In-Reply-To: References: <5fEVX0zAsdNd9v3Rk6Gr4lIXTc96g2LndUhX4Qb-bgc=.e4553c72-8da4-41c7-b71f-628bbeea14be@github.com> Message-ID: On Wed, 12 Mar 2025 08:09:24 GMT, Emanuel Peter wrote: >> Can you also say what the meaning of `mask_max_bw` is? Possibly a more expressive name would help here too. > > If I'm right here, you want to say how many many of the low significance bits are unknown, and that the rest, the high significance bits are all zero, right? You could say that we are trying to find case where we know that the compression result will have at least a certain number of leading zeros, which allows us to restrict the type. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r1990925747 From duke at openjdk.org Wed Mar 12 08:47:02 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 12 Mar 2025 08:47:02 GMT Subject: Integrated: 8350194: Last 2 parameters of ReturnNode::ReturnNode are swapped in the declaration In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 07:42:14 GMT, Manuel H?ssig wrote: > The last two parameters in the declaration of ReturnNode::ReturnNode, `frameptr` and `retadr` were swapped in the declaration compared to the definition. This commit makes the declaration consistent with the definition and the two usages in [`GraphKit::gen_stub()`](https://github.com/openjdk/jdk/blob/5c552a9d64c8116161cb9ef4c777e75a2602a75b/src/hotspot/share/opto/generateOptoStub.cpp#L267) and [`Compile::return_values()`](https://github.com/openjdk/jdk/blob/5c552a9d64c8116161cb9ef4c777e75a2602a75b/src/hotspot/share/opto/parse1.cpp#L879). > > Tests: tiers 1 through 3 passed. This pull request has now been integrated. Changeset: 1fe45265 Author: Manuel H?ssig Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/1fe45265e446eeca5dc496085928ce20863a3172 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8350194: Last 2 parameters of ReturnNode::ReturnNode are swapped in the declaration Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/23927 From mli at openjdk.org Wed Mar 12 09:09:57 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 09:09:57 GMT Subject: RFR: 8351662: [Test] RISC-V: enable bunch of IR test In-Reply-To: References: Message-ID: <39PEoD--Z8jMFlcVq2kmj-6bBQOEJdnd_Bi50vkvjF4=.f53766a5-6ec2-4757-a327-ef77e045f349@github.com> On Wed, 12 Mar 2025 03:27:43 GMT, Fei Yang wrote: > LGTM. Thanks for doing this. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23985#issuecomment-2717158059 From epeter at openjdk.org Wed Mar 12 09:27:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Mar 2025 09:27:53 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v4] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 09:39:28 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this trivial patch? >> Currently, it only reports that some flag is non-whitelisted, but does not print out the flag explicitly, but it's helpful to do so. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Nice idea, looks good to me too :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23931#pullrequestreview-2677689665 From rcastanedalo at openjdk.org Wed Mar 12 09:58:33 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Mar 2025 09:58:33 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call Message-ID: The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: for (int i = 0; i < array.length; i++) { array[i] = 0; } with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: arrayof_jint_fill(array, 0, array.length) The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. A more general solution to this issue, and a way to prevent similar bugs in the future, would be to define a `StoreS` node returning the appropriate `memory_type()`. I propose to investigate this in a separate RFE and keep this fix as minimal and non-intrusive as possible for backportability. **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). ------------- Commit messages: - Remove temporary assertion - Add test - Refine assertion to deal with aliasing byte/Boolean types - Compute basic type from the store's address type, circumventing the innacurate MemNode::memory_type() Changes: https://git.openjdk.org/jdk/pull/24005/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24005&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351468 Stats: 213 lines in 2 files changed: 212 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24005/head:pull/24005 PR: https://git.openjdk.org/jdk/pull/24005 From shade at openjdk.org Wed Mar 12 09:58:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 09:58:33 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 09:47:17 GMT, Roberto Casta?eda Lozano wrote: > The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: > > > for (int i = 0; i < array.length; i++) { > array[i] = 0; > } > > with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: > > > arrayof_jint_fill(array, 0, array.length) > > > The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). > > This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. A more general solution to this issue, and a way to prevent similar bugs in the future, would be to define a `StoreS` node returning the appropriate `memory_type()`. I propose to investigate this in a separate RFE and keep this fix as minimal and non-intrusive as possible for backportability. > > **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). Wow, nice landmine. (If you pull from current master, GHA should become clean) ------------- PR Review: https://git.openjdk.org/jdk/pull/24005#pullrequestreview-2677781604 From bkilambi at openjdk.org Wed Mar 12 10:00:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 12 Mar 2025 10:00:54 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 03:00:39 GMT, Xiaohong Gong wrote: >> Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures. >> >> The performance of Vector API relative JMH micro benchmarks can improve about 70x ~ 95x on a NVIDIA Grace CPU, which is a 128-bit vector length sve2 architecture, with different UseSVE options. Here is the gain details: >> >> >> Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 >> ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x >> ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x >> ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x >> ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x >> ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x >> ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x >> ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x >> ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x >> ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x >> ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x >> ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x >> ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x >> IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x >> IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x >> IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x >> IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x >> IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x >> IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x >> IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x >> IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x >> IntMaxVector.UMAX 1024 thrpt 30 82.40x 81.85x 82.242x >> IntMaxVector.UMAXMasked 1024 thrpt 30 90.30x 92.10x 92.587x >> IntMaxVector.UMIN 1024 thrpt 30 82.84x 81.43x 82.801x >> IntMaxVector.UMINMasked 1024 thrpt 30 90.43x 91.49x 92.678x >> LongMaxVector.SADD 102... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'jdk:master' into JDK_8349522 > - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations > > Since PR [1] has added several new vector operations in VectorAPI > and the X86 backend implementation for them, this patch adds the > AArch64 backend part for NEON/SVE architectures. > > The performance of Vector API relative jmh micro benchmarks can > improve about 70x ~ 95x on an AArch64 128-bit vector length sve2 > architecture with different UseSVE options. Here is the uplift > details: > > ``` > Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 > ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x > ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x > ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x > ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x > ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x > ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x > ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x > ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x > ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x > ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x > ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x > ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x > IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x > IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x > IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x > IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x > IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x > IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x > IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x > IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x > IntMaxVector.UMAX 1024 thrpt 30 8... Looks good to me ------------- Marked as reviewed by bkilambi (Author). PR Review: https://git.openjdk.org/jdk/pull/23608#pullrequestreview-2677791094 From duke at openjdk.org Wed Mar 12 10:07:38 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 12 Mar 2025 10:07:38 GMT Subject: RFR: 8350485: C2: factor out common code in Node::grow() and Node::out_grow() [v3] In-Reply-To: References: Message-ID: <_rgS_qoyZt6mhsO0oEnVEhFw0fYGirVUpvjqFUokkJ8=.f220fbf7-b801-4ceb-a8b8-40a055fe072d@github.com> > Node:grow() and Node::out_grow() are copy-pasted from each other and their core logic could be factored out into a third function or at least cleaned up. Hence,the fix includes a function Node::array_resize() that implements the core logic of Node::grow() and Node::out_grow(). > > Link to Github action which had no failures : https://github.com/sarannat/jdk/actions/runs/13677508359 Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: 8350485: Addressing review comments with code formatting and fixing/removing comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23928/files - new: https://git.openjdk.org/jdk/pull/23928/files/f308762b..6e94377d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23928&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23928&range=01-02 Stats: 15 lines in 2 files changed: 0 ins; 5 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23928/head:pull/23928 PR: https://git.openjdk.org/jdk/pull/23928 From mli at openjdk.org Wed Mar 12 10:21:52 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 10:21:52 GMT Subject: RFR: 8351345: [IR Framework] Improve reported disabled IR verification messages [v4] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 09:25:40 GMT, Emanuel Peter wrote: > Nice idea, looks good to me too :) Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23931#issuecomment-2717367229 From qamai at openjdk.org Wed Mar 12 10:25:05 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Mar 2025 10:25:05 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: Message-ID: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> On Wed, 12 Mar 2025 09:47:17 GMT, Roberto Casta?eda Lozano wrote: > The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: > > > for (int i = 0; i < array.length; i++) { > array[i] = 0; > } > > with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: > > > arrayof_jint_fill(array, 0, array.length) > > > The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). > > This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. A more general solution to this issue, and a way to prevent similar bugs in the future, would be to define a `StoreS` node returning the appropriate `memory_type()`. I propose to investigate this in a separate RFE and keep this fix as minimal and non-intrusive as possible for backportability. > > **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). I think the issue here is the implementation of `MemNode::memory_type()`, it says that it returns the type of the value in memory, but it always returns `T_CHAR` for `StoreC` which seems non-sensical, what if I `StoreC` to a `long[]`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2717377473 From duke at openjdk.org Wed Mar 12 10:51:43 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 12 Mar 2025 10:51:43 GMT Subject: RFR: 8330469: C2: Remove or change "PrintOpto && VerifyLoopOptimizations" as printing code condition [v3] In-Reply-To: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> References: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> Message-ID: > **Issue:** There are currently 9 occurrences where we guard printing code with PrintOpto && VerifyLoopOptimizations. This flag combo is never really used in practice. > > **Solution**: I analysed the 9 occurrence. In cases, where the flag `PrintOpto && VerifyLoopOptimizations` was followed by flag `TraceLoopOpts` with `else if` or `|| operator` I removed the former flag. In other cases, where `PrintOpto && VerifyLoopOptimizations` was the only flag, I was replaced with `TraceLoopOpts`. > > **Test Result**: Link to [GitHub Action](https://github.com/sarannat/jdk/actions/runs/13723071055) run on commit [91ecc51](https://github.com/sarannat/jdk/commit/91ecc5190ce31da94bded4de210136f337286e69) Saranya Natarajan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8330469 - JDK-8330469: Addressing review comments by removing TraceLoopOpts and some dump() - 8330469 : Removed or replaced (PrintOpto && VerifyLoopOptimizations) with TraceLoopOpts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23959/files - new: https://git.openjdk.org/jdk/pull/23959/files/b7216ca6..3ee2e887 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23959&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23959&range=01-02 Stats: 48784 lines in 726 files changed: 24534 ins; 15325 del; 8925 mod Patch: https://git.openjdk.org/jdk/pull/23959.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23959/head:pull/23959 PR: https://git.openjdk.org/jdk/pull/23959 From fyang at openjdk.org Wed Mar 12 11:55:25 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Mar 2025 11:55:25 GMT Subject: RFR: 8351839: RISC-V: Fix base offset calculation introduced in JDK-8347489 Message-ID: As discussed in https://github.com/openjdk/jdk/pull/23633#discussion_r1974591975, there is no need to distinuish `T_BYTE` and `T_CHAR` when calculating base offset for strings. The reason is that the low-level character storage used for both Latin1 and UTF16 strings is always a byte array [1]. So we should always use `T_BYTE` for both cases. This won't make a difference on the calculated base offset for now. But it's better to fix this for code readability purposes. Sanity tested on linux-riscv64 w/wo COH. [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L160 ------------- Commit messages: - 8351839: RISC-V: Fix base offset calculation introduced in JDK-8347489 Changes: https://git.openjdk.org/jdk/pull/24006/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24006&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351839 Stats: 14 lines in 2 files changed: 0 ins; 6 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24006/head:pull/24006 PR: https://git.openjdk.org/jdk/pull/24006 From epeter at openjdk.org Wed Mar 12 12:26:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Mar 2025 12:26:56 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v2] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 16:05:38 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - copyright > - style > - Merge branch 'openjdk:master' into mernst/JDK-8350563 > - RegTest > - Merge branch 'openjdk:master' into mernst/JDK-8350563 > - push `con->(cast*)->and` uses Thanks for the updates! It looks much better :) src/hotspot/share/opto/phaseX.cpp line 2008: > 2006: ((use_op == Op_LShiftI || use_op == Op_LShiftL) && use->in(2) == parent)) { > 2007: > 2008: auto push_and_uses_to_worklist = [&](Node* n) { Amazing, this looks much better. I suggest you rename `new_type` -> `parent_type`, just to keep things consistent. test/hotspot/jtreg/compiler/c2/TestAndConZeroCCP.java line 1: > 1: /* The test should probably be moved to `test/hotspot/jtreg/compiler/ccp/`, that would be more specific. test/hotspot/jtreg/compiler/c2/TestAndConZeroCCP.java line 28: > 26: * @bug 8350563 > 27: * @summary Test that And nodes are added to the CCP worklist if they have a constant as input. > 28: * @run main/othervm -Xbatch -XX:-TieredCompilation compiler.c2.TestAndConZeroCCP Suggestion: * @run main/othervm -Xbatch -XX:-TieredCompilation compiler.c2.TestAndConZeroCCP * @run driver compiler.c2.TestAndConZeroCCP That way we also have a run without flags, just in case that triggers some other (unrelated) bug. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23871#pullrequestreview-2678231141 PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r1991363356 PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r1991360581 PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r1991358419 From epeter at openjdk.org Wed Mar 12 12:39:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Mar 2025 12:39:04 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Sat, 1 Mar 2025 13:34:30 GMT, Hannes Greule wrote: > subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. > > For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: > - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway > - For other values, the msb is flipped, flipping twice results in the original value again. > > Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. > > One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like > > int v = 0; > for (int datum : data) { > v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); > } > return v; > > was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. Nice idea! Thanks for the work :) test/hotspot/jtreg/compiler/c2/irTests/InvolutionIdentityTests.java line 83: > 81: assertResultF(nanf); > 82: > 83: double ad = RunInfo.getRandom().nextDouble(); This actually only generates values between `0.0...1.0`. Can you instead use `Generators.java`? It will make sure to generate "interesting" values, including different encodings of `NaN`, infinity, etc. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23851#pullrequestreview-2678314776 PR Review Comment: https://git.openjdk.org/jdk/pull/23851#discussion_r1991404835 From rcastanedalo at openjdk.org Wed Mar 12 12:38:36 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Mar 2025 12:38:36 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v2] In-Reply-To: References: Message-ID: <_LMbjG_E1U4lFL5Ba9cCzjiqLHr4agiWtjwdPjGpjNY=.e87b3585-0df6-4608-bfdb-cff505fba6a3@github.com> > The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: > > > for (int i = 0; i < array.length; i++) { > array[i] = 0; > } > > with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: > > > arrayof_jint_fill(array, 0, array.length) > > > The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). > > This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. A more general solution to this issue, and a way to prevent similar bugs in the future, would be to define a `StoreS` node returning the appropriate `memory_type()`. I propose to investigate this in a separate RFE and keep this fix as minimal and non-intrusive as possible for backportability. > > **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge master - Remove temporary assertion - Add test - Refine assertion to deal with aliasing byte/Boolean types - Compute basic type from the store's address type, circumventing the innacurate MemNode::memory_type() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24005/files - new: https://git.openjdk.org/jdk/pull/24005/files/47e478a6..90fd7660 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24005&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24005&range=00-01 Stats: 42515 lines in 533 files changed: 19874 ins; 14843 del; 7798 mod Patch: https://git.openjdk.org/jdk/pull/24005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24005/head:pull/24005 PR: https://git.openjdk.org/jdk/pull/24005 From epeter at openjdk.org Wed Mar 12 12:43:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Mar 2025 12:43:52 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 01:18:57 GMT, Xiaohong Gong wrote: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... Let me run some extra testing, please ping me in 1-2 days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2717760146 From duke at openjdk.org Wed Mar 12 12:46:58 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 12 Mar 2025 12:46:58 GMT Subject: RFR: 8330469: C2: Remove or change "PrintOpto && VerifyLoopOptimizations" as printing code condition [v3] In-Reply-To: References: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> Message-ID: On Mon, 10 Mar 2025 12:00:24 GMT, Christian Hagedorn wrote: >> Saranya Natarajan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8330469 >> - JDK-8330469: Addressing review comments by removing TraceLoopOpts and some dump() >> - 8330469 : Removed or replaced (PrintOpto && VerifyLoopOptimizations) with TraceLoopOpts > > src/hotspot/share/opto/split_if.cpp line 143: > >> 141: tty->print("Cloning up: "); >> 142: n->dump(); >> 143: } > > Same here with this print and the other places in `clone_cmp_down()`: I think it's too verbose for `TraceLoopOpts`. > > Since Split-If is quite complex, I think it would make sense to add a "TraceSplitIf" flag to get more information about the optimization. It's probably out of scope of this bug, so we could do that in a separate RFE. For this PR, I suggest to just drop these printings and link this PR to the "TraceSplitIf" RFE in order to restore/update/improve these. Thank you for the review. I have created RFE [JDK-8351847](https://bugs.openjdk.org/browse/JDK-8351847) for the "TraceSplitIf" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23959#discussion_r1991419982 From epeter at openjdk.org Wed Mar 12 12:47:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Mar 2025 12:47:55 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 01:18:57 GMT, Xiaohong Gong wrote: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... It would also be good to add some IR tests, or possibly modify existing IR rules to reflect that more cases now vectorize, according to `Matcher::match_rule_supported_vector`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2717769404 From mli at openjdk.org Wed Mar 12 13:28:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 13:28:06 GMT Subject: Integrated: 8351345: [IR Framework] Improve reported disabled IR verification messages In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 12:16:57 GMT, Hamlin Li wrote: > Hi, > Can you help to review this trivial patch? > Currently, it only reports that some flag is non-whitelisted, but does not print out the flag explicitly, but it's helpful to do so. > > Thanks! This pull request has now been integrated. Changeset: 3b189e0e Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/3b189e0e78c867b75e984bfaabc92d12b9ff2b9e Stats: 44 lines in 1 file changed: 23 ins; 7 del; 14 mod 8351345: [IR Framework] Improve reported disabled IR verification messages Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/23931 From roland at openjdk.org Wed Mar 12 13:28:13 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 12 Mar 2025 13:28:13 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v12] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - merge - Merge branch 'master' into JDK-8342692 - Merge branch 'master' into JDK-8342692 - whitespace - Merge branch 'master' into JDK-8342692 - TestMemorySegment test fix - test wip - Merge branch 'master' into JDK-8342692 - refactor - Merge branch 'master' into JDK-8342692 - ... and 27 more: https://git.openjdk.org/jdk/compare/08872623...f740b966 ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=11 Stats: 1310 lines in 25 files changed: 1250 ins; 13 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From mli at openjdk.org Wed Mar 12 13:53:05 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 13:53:05 GMT Subject: RFR: 8351839: RISC-V: Fix base offset calculation introduced in JDK-8347489 In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 11:24:23 GMT, Fei Yang wrote: > As discussed in https://github.com/openjdk/jdk/pull/23633#discussion_r1974591975, there is no need to distinuish `T_BYTE` and `T_CHAR` when calculating base offset for strings. > The reason is that the low-level character storage used for both Latin1 and UTF16 strings is always a byte array [1]. > So we should always use `T_BYTE` for both cases. This won't make a difference on the calculated base offset for now. > But it's better to fix this for code readability purposes. Sanity tested on linux-riscv64 w/wo COH. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L160 Looks good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24006#pullrequestreview-2678568197 From rcastanedalo at openjdk.org Wed Mar 12 14:17:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Mar 2025 14:17:55 GMT Subject: RFR: 8350485: C2: factor out common code in Node::grow() and Node::out_grow() [v3] In-Reply-To: <_rgS_qoyZt6mhsO0oEnVEhFw0fYGirVUpvjqFUokkJ8=.f220fbf7-b801-4ceb-a8b8-40a055fe072d@github.com> References: <_rgS_qoyZt6mhsO0oEnVEhFw0fYGirVUpvjqFUokkJ8=.f220fbf7-b801-4ceb-a8b8-40a055fe072d@github.com> Message-ID: On Wed, 12 Mar 2025 10:07:38 GMT, Saranya Natarajan wrote: >> Node:grow() and Node::out_grow() are copy-pasted from each other and their core logic could be factored out into a third function or at least cleaned up. Hence,the fix includes a function Node::array_resize() that implements the core logic of Node::grow() and Node::out_grow(). >> >> Link to Github action which had no failures : https://github.com/sarannat/jdk/actions/runs/13677508359 > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > 8350485: Addressing review comments with code formatting and fixing/removing comments Looks good, thanks! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23928#pullrequestreview-2678681812 From sviswanathan at openjdk.org Wed Mar 12 14:18:57 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 12 Mar 2025 14:18:57 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: References: Message-ID: <_cQR4s9TEmTN7kEl88euP4PIneD_syVmCCvaz82Exf4=.f2f008e5-186d-4844-9e6d-950d01dd1b9b@github.com> On Wed, 12 Mar 2025 03:37:23 GMT, Jatin Bhateja wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 78: > >> 76: @IR(counts = {IRNode.VECTOR_CAST_HF2F, "> 0"}, >> 77: applyIfOr = {"UseCompactObjectHeaders", "false", "AlignVector", "false"}, >> 78: applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}, > > Can you kindly justify the need for compressed object header usage, it will mainly impact the pre-loop trip count compuation. AlignVector should be sufficient since it's a whitelisted option This check is taken from compiler/vectorization/TestFloatConversionsVector.java which also has float16 conversion tests to be in sync. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1991601610 From chagedorn at openjdk.org Wed Mar 12 14:25:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 12 Mar 2025 14:25:59 GMT Subject: RFR: 8330469: C2: Remove or change "PrintOpto && VerifyLoopOptimizations" as printing code condition [v3] In-Reply-To: References: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> Message-ID: On Wed, 12 Mar 2025 12:44:19 GMT, Saranya Natarajan wrote: >> src/hotspot/share/opto/split_if.cpp line 143: >> >>> 141: tty->print("Cloning up: "); >>> 142: n->dump(); >>> 143: } >> >> Same here with this print and the other places in `clone_cmp_down()`: I think it's too verbose for `TraceLoopOpts`. >> >> Since Split-If is quite complex, I think it would make sense to add a "TraceSplitIf" flag to get more information about the optimization. It's probably out of scope of this bug, so we could do that in a separate RFE. For this PR, I suggest to just drop these printings and link this PR to the "TraceSplitIf" RFE in order to restore/update/improve these. > > Thank you for the review. I have created RFE [JDK-8351847](https://bugs.openjdk.org/browse/JDK-8351847) for the "TraceSplitIf" Awesome, thanks a lot! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23959#discussion_r1991617296 From rcastanedalo at openjdk.org Wed Mar 12 14:35:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 12 Mar 2025 14:35:57 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> Message-ID: On Wed, 12 Mar 2025 10:22:15 GMT, Quan Anh Mai wrote: > I think the issue here is the implementation of `MemNode::memory_type()`. I agree, in particular the fact that `StoreC` nodes are used to represent both `short` and `char` stores but always return `T_CHAR` as their `memory_type()`. That is why I propose to simply circumvent the usage of `MemNode::memory_type()` to compute the type of the array fill intrinsic in this changeset, and explore creating a dedicated `StoreS` node in a separate RFE. > what if I `StoreC` to a `long[]`? A store of a `char` value into a `long[]` array would be represented at the IR level as a conversion (`ConvI2L`) followed by a `StoreL`, no? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2718102394 From qamai at openjdk.org Wed Mar 12 14:45:57 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Mar 2025 14:45:57 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> Message-ID: On Wed, 12 Mar 2025 14:33:15 GMT, Roberto Casta?eda Lozano wrote: > A store of a char value into a long[] array would be represented at the IR level as a conversion (ConvI2L) followed by a StoreL, no? No, a code such as this `MemorySegment.ofArray(longArray).set(ValueLayout.JAVA_SHORT, offset, c)` would produce a `StoreC` into a `long[]`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2718133834 From mli at openjdk.org Wed Mar 12 14:53:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 14:53:03 GMT Subject: RFR: 8351861: RISC-V: add simple assert at arrays_equals_v Message-ID: Hi, Can you help to review this trivial patch? `arrays_equals_v` and `arrays_equals` are 2 versions of the same node `AryEqNode`, input `elem_size` should be the same, so should share the same assert of `elem_size`, this also make the code below more clear. Although at the same time, we could do the similar thing like https://github.com/openjdk/jdk/pull/24006, but as the code of `arrays_equals_v` and `arrays_equals` seems not require the input must be a byte[] (although in the java level a string's payload is indeed a byte[]), so I'll just leave if as is. Thanks ------------- Commit messages: - assert diff registers - initial commit Changes: https://git.openjdk.org/jdk/pull/24008/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24008&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351861 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24008.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24008/head:pull/24008 PR: https://git.openjdk.org/jdk/pull/24008 From rehn at openjdk.org Wed Mar 12 15:01:52 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 12 Mar 2025 15:01:52 GMT Subject: RFR: 8351662: [Test] RISC-V: enable bunch of IR test In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 14:08:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > There are bunch of IR test not enabled on riscv, it's good to enable them, because enabling them will help to: > 1. cover the test gap > 2. find out potential missing intrinsics. > > This patch also changes cpu features from `vm.opt.UseXxx` to `xxx`, as it's easier to find out these tests in the future. > > NOTE: There are still some other test should be enabled, but currently they fail when simply enable them, they will be further investigated, later could be enabled in other PRs with additional code change/implementation if it's feasible. > > Thanks! Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23985#pullrequestreview-2678850518 From epeter at openjdk.org Wed Mar 12 15:13:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Mar 2025 15:13:57 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 03:00:39 GMT, Xiaohong Gong wrote: >> Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures. >> >> The performance of Vector API relative JMH micro benchmarks can improve about 70x ~ 95x on a NVIDIA Grace CPU, which is a 128-bit vector length sve2 architecture, with different UseSVE options. Here is the gain details: >> >> >> Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 >> ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x >> ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x >> ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x >> ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x >> ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x >> ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x >> ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x >> ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x >> ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x >> ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x >> ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x >> ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x >> IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x >> IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x >> IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x >> IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x >> IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x >> IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x >> IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x >> IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x >> IntMaxVector.UMAX 1024 thrpt 30 82.40x 81.85x 82.242x >> IntMaxVector.UMAXMasked 1024 thrpt 30 90.30x 92.10x 92.587x >> IntMaxVector.UMIN 1024 thrpt 30 82.84x 81.43x 82.801x >> IntMaxVector.UMINMasked 1024 thrpt 30 90.43x 91.49x 92.678x >> LongMaxVector.SADD 102... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'jdk:master' into JDK_8349522 > - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations > > Since PR [1] has added several new vector operations in VectorAPI > and the X86 backend implementation for them, this patch adds the > AArch64 backend part for NEON/SVE architectures. > > The performance of Vector API relative jmh micro benchmarks can > improve about 70x ~ 95x on an AArch64 128-bit vector length sve2 > architecture with different UseSVE options. Here is the uplift > details: > > ``` > Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 > ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x > ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x > ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x > ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x > ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x > ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x > ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x > ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x > ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x > ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x > ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x > ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x > IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x > IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x > IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x > IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x > IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x > IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x > IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x > IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x > IntMaxVector.UMAX 1024 thrpt 30 8... Let me run some tests for this, and ping me in 1-2 days :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2718222191 From tonyp at openjdk.org Wed Mar 12 15:23:53 2025 From: tonyp at openjdk.org (Antonios Printezis) Date: Wed, 12 Mar 2025 15:23:53 GMT Subject: RFR: 8351662: [Test] RISC-V: enable bunch of IR test In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 14:08:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > There are bunch of IR test not enabled on riscv, it's good to enable them, because enabling them will help to: > 1. cover the test gap > 2. find out potential missing intrinsics. > > This patch also changes cpu features from `vm.opt.UseXxx` to `xxx`, as it's easier to find out these tests in the future. > > NOTE: There are still some other test should be enabled, but currently they fail when simply enable them, they will be further investigated, later could be enabled in other PRs with additional code change/implementation if it's feasible. > > Thanks! Looks good. ------------- Marked as reviewed by tonyp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23985#pullrequestreview-2678926121 From ecaspole at openjdk.org Wed Mar 12 15:54:28 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Wed, 12 Mar 2025 15:54:28 GMT Subject: RFR: 8346470: Improve WriteBarrier JMH to have old-to-young refs Message-ID: Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. Here is a run on a standard OCI A1.160 with JDK 25: Benchmark Mode Cnt Score Error Units WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.555 ? 0.116 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.646 ? 1.560 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.525 ? 0.070 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3104.395 ? 1.537 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.766 ? 0.136 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9586.585 ? 1.086 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.147 ? 0.128 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.531 ? 0.883 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.669 ? 0.620 ns/op WriteBarrier.WithoutUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op WriteBarrier.WithoutUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op ------------- Commit messages: - cleanup and copyright year - 8346470: Improve WriteBarrier JMH to have old-to-young refs Changes: https://git.openjdk.org/jdk/pull/24010/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24010&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346470 Stats: 94 lines in 1 file changed: 92 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24010.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24010/head:pull/24010 PR: https://git.openjdk.org/jdk/pull/24010 From cushon at openjdk.org Wed Mar 12 16:03:42 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 12 Mar 2025 16:03:42 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v3] In-Reply-To: References: Message-ID: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/c2/TestAndConZeroCCP.java Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23871/files - new: https://git.openjdk.org/jdk/pull/23871/files/a1d7826a..60ccd522 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23871/head:pull/23871 PR: https://git.openjdk.org/jdk/pull/23871 From cushon at openjdk.org Wed Mar 12 16:08:16 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 12 Mar 2025 16:08:16 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v2] In-Reply-To: References: Message-ID: <_e7HX1nWHFx61hN64q86Mi2arn_2CNKQ7_AzN_crizE=.35ffef5f-ebf6-439f-9f03-63c64b0adc04@github.com> On Wed, 12 Mar 2025 12:14:39 GMT, Emanuel Peter wrote: >> Liam Miller-Cushon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - copyright >> - style >> - Merge branch 'openjdk:master' into mernst/JDK-8350563 >> - RegTest >> - Merge branch 'openjdk:master' into mernst/JDK-8350563 >> - push `con->(cast*)->and` uses > > src/hotspot/share/opto/phaseX.cpp line 2008: > >> 2006: ((use_op == Op_LShiftI || use_op == Op_LShiftL) && use->in(2) == parent)) { >> 2007: >> 2008: auto push_and_uses_to_worklist = [&](Node* n) { > > Amazing, this looks much better. I suggest you rename `new_type` -> `parent_type`, just to keep things consistent. Done > test/hotspot/jtreg/compiler/c2/TestAndConZeroCCP.java line 1: > >> 1: /* > > The test should probably be moved to `test/hotspot/jtreg/compiler/ccp/`, that would be more specific. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r1991831170 PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r1991831262 From cushon at openjdk.org Wed Mar 12 16:08:14 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 12 Mar 2025 16:08:14 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v4] In-Reply-To: References: Message-ID: <5_7_WclfMhPtMey2k2Ty5ryRx3PTuqLNyG7kjWlEOlA=.338d31a7-1446-4864-9a77-acce761efd31@github.com> > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23871/files - new: https://git.openjdk.org/jdk/pull/23871/files/60ccd522..f28c1d46 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23871/head:pull/23871 PR: https://git.openjdk.org/jdk/pull/23871 From iklam at openjdk.org Wed Mar 12 16:26:05 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Mar 2025 16:26:05 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK [v3] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 15:28:25 GMT, Jiangli Zhou wrote: >> Please review the `Arguments::parse_each_vm_init_arg` change to ignore`-server|-client` options, which avoids unrecognized option error on static JDK. >> >> On regular JDK, '-server|-client' options are processed/removed from command-line arguments by `CheckJvmType` during `CreateExecutionEnvironment`. That happens before `Arguments::parse_each_vm_init_arg` is called. With jvm.cfg setting, only server vm is known and client is ignored. So specifying '-server' and '-client' in command-line is really a no-op. >> >> On static JDK, the VM is statically linked with the launcher, and `CreateExecutionEnvironment` & `CheckJvmType` are not called. As the result, `Arguments::parse_each_vm_init_arg` could see `-server|-client` when running on static JDK, if the options are specified in the command line. > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Remove @bug and update @summary. CDS test changes look good to me. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23881#pullrequestreview-2679130059 From mli at openjdk.org Wed Mar 12 17:03:57 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 17:03:57 GMT Subject: RFR: 8351662: [Test] RISC-V: enable bunch of IR test In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 14:59:23 GMT, Robbin Ehn wrote: >> Hi, >> Can you help to review this patch? >> >> There are bunch of IR test not enabled on riscv, it's good to enable them, because enabling them will help to: >> 1. cover the test gap >> 2. find out potential missing intrinsics. >> >> This patch also changes cpu features from `vm.opt.UseXxx` to `xxx`, as it's easier to find out these tests in the future. >> >> NOTE: There are still some other test should be enabled, but currently they fail when simply enable them, they will be further investigated, later could be enabled in other PRs with additional code change/implementation if it's feasible. >> >> Thanks! > > Thanks! Thank you @robehn @gctony ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23985#issuecomment-2718544534 From mli at openjdk.org Wed Mar 12 17:06:31 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Mar 2025 17:06:31 GMT Subject: RFR: 8351876: RISC-V: enable and fix some float round tests Message-ID: Hi, Can you help to review this simple patch? It's a follow-up of https://github.com/openjdk/jdk/pull/23985. Thanks ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24015/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24015&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351876 Stats: 13 lines in 2 files changed: 8 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24015/head:pull/24015 PR: https://git.openjdk.org/jdk/pull/24015 From jiangli at openjdk.org Wed Mar 12 17:23:10 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 12 Mar 2025 17:23:10 GMT Subject: RFR: 8350982: -server|-client causes fatal exception on static JDK In-Reply-To: <7YqG7ryKpJEk6hBnKnTj3Y_O2p_9x7XfT0M6jNw7nQg=.c5b75d99-9ce0-4898-a174-3646cfc4402a@github.com> References: <1UivDxf7iNhuhkTsh0S60VECAnkYkH4HQrYMmlCrZy0=.179c743b-4495-498c-b4ab-9fc0efce9467@github.com> <7YqG7ryKpJEk6hBnKnTj3Y_O2p_9x7XfT0M6jNw7nQg=.c5b75d99-9ce0-4898-a174-3646cfc4402a@github.com> Message-ID: On Mon, 10 Mar 2025 19:54:09 GMT, Alan Bateman wrote: >> Jiangli and I chatted about this today. We don't think there will be developers looking to specify -server or -client to a static image, instead this is more about the tests. So we think the best think is to look at the tests that still specify -server and see if it can be dropped. Some of the tests (say for C2) might be better off using `@requires vm.compiler2.enabled` or `@requires vm.flavor == "server"`. > >> @AlanBateman @dholmes-ora @iklam Do you have any other comments/questions about the change? @vnkozlov or others from compiler side, can you please take a look of the change as well? Thanks > > I wasn't initially sure about XShareAuto.java but I see the exchange between you and Ioi so I think all good. @AlanBateman @iklam Thanks for the review! Moving forward on this now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23881#issuecomment-2718588289 From jiangli at openjdk.org Wed Mar 12 17:23:11 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 12 Mar 2025 17:23:11 GMT Subject: Integrated: 8350982: -server|-client causes fatal exception on static JDK In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 01:43:29 GMT, Jiangli Zhou wrote: > Please review the `Arguments::parse_each_vm_init_arg` change to ignore`-server|-client` options, which avoids unrecognized option error on static JDK. > > On regular JDK, '-server|-client' options are processed/removed from command-line arguments by `CheckJvmType` during `CreateExecutionEnvironment`. That happens before `Arguments::parse_each_vm_init_arg` is called. With jvm.cfg setting, only server vm is known and client is ignored. So specifying '-server' and '-client' in command-line is really a no-op. > > On static JDK, the VM is statically linked with the launcher, and `CreateExecutionEnvironment` & `CheckJvmType` are not called. As the result, `Arguments::parse_each_vm_init_arg` could see `-server|-client` when running on static JDK, if the options are specified in the command line. This pull request has now been integrated. Changeset: 02c850fc Author: Jiangli Zhou URL: https://git.openjdk.org/jdk/commit/02c850fca87372173eadba18dfa0231df33bebb0 Stats: 19 lines in 14 files changed: 2 ins; 5 del; 12 mod 8350982: -server|-client causes fatal exception on static JDK Reviewed-by: iklam, alanb ------------- PR: https://git.openjdk.org/jdk/pull/23881 From sparasa at openjdk.org Wed Mar 12 17:58:03 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 12 Mar 2025 17:58:03 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v2] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 19:11:36 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> revert to nf version for {pop/tz/lz}cnt count instructions > > src/hotspot/cpu/x86/x86_64.ad line 4398: > >> 4396: %} >> 4397: ins_encode %{ >> 4398: __ eshrl($dst$$Register, $mem$$Address, markWord::klass_shift_at_offset, false); > > This change could be done as part of loadNKlassCompactHeaders instruct itself as there is no additional register needed. > Something like below: > if (UseAPX_ { > __ eshrl($dst$$Register, $mem$$Address, markWord::klass_shift_at_offset, false); > } else { > __ movl($dst$$Register, $mem$$Address); > __ shrl($dst$$Register, markWord::klass_shift_at_offset); > } Please see this suggestion incorporated in the updated code. > src/hotspot/cpu/x86/x86_64.ad line 5587: > >> 5585: >> 5586: instruct countLeadingZerosI_mem_nf(rRegI dst, memory src) %{ >> 5587: predicate(UseAPX && UseCountLeadingZerosInstruction); > > This instruct could be removed as this is already an unary operation with separate destination, Please see this suggestion incorporated in the updated code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992011506 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992012405 From sparasa at openjdk.org Wed Mar 12 17:58:06 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 12 Mar 2025 17:58:06 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v3] In-Reply-To: <7DTflbzBDB9d3ybuR9Zf-opTa59AV5rCrStYBCluhgg=.703447f7-056f-4425-8867-b87a5c8e4c5f@github.com> References: <7DTflbzBDB9d3ybuR9Zf-opTa59AV5rCrStYBCluhgg=.703447f7-056f-4425-8867-b87a5c8e4c5f@github.com> Message-ID: On Wed, 5 Mar 2025 18:21:00 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> remove epopcount, elzcnt, etzcnt > > src/hotspot/cpu/x86/x86_64.ad line 5796: > >> 5794: %} >> 5795: >> 5796: > > A nit pick, unnecessary extra blank lines :). Please see the additional blank lines removed in the updated code. > src/hotspot/cpu/x86/x86_64.ad line 6239: > >> 6237: >> 6238: >> 6239: instruct cmovI_regUCF2_ne(cmpOpUCF2 cop, rFlagsRegUCF cr, rRegI dst, rRegI src) %{ > > The cmovI_regUCF2_ne, cmovl_regUCF2_eq, cmovP_regUCF2_ne, cmovP_regUCF2_eq, cmovL_regUCF2_ne, cmovL_regUCF2_eq instructs could also use the ecmovl() instructions. Please see the added NDD versions for cmovI_regUCF2_ne, cmovl_regUCF2_eq, cmovP_regUCF2_ne, cmovP_regUCF2_eq, cmovL_regUCF2_ne, cmovL_regUCF2_eq. > src/hotspot/cpu/x86/x86_64.ad line 6871: > >> 6869: predicate(UseAPX); >> 6870: match(Set dst (AddI src1 src2)); >> 6871: effect(KILL cr); > > We should also bring in the corresponding flag(PD::...); line from instruct addI_rReg in this and other rules where applicable. Please see the updated code with flags(PD::...) > src/hotspot/cpu/x86/x86_64.ad line 8253: > >> 8251: %} >> 8252: >> 8253: instruct negI_rReg_ndd(rRegI src, rRegI dst, immI_0 zero, rFlagsReg cr) > > A nit pick in many of the new negI/negL instructs, we usually list the dst first in instruct. Please see the order of dst and src fixed. > src/hotspot/cpu/x86/x86_64.ad line 9060: > >> 9058: >> 9059: // Arithmetic Shift Right by variable >> 9060: instruct sarI_rReg_CL_ndd(rRegI dst, rRegI src, rcx_RegI shift, rFlagsReg cr) > > The new instructs sarI_rReg_CL_ndd, shrI_rReg_CL_ndd, salL_rReg_CL_ndd, sarL_rReg_CL_ndd, shrL_rReg_CL_ndd could be removed and the original !bmi2 versions could be kept. We dont need to optimize with APX instructions for non bmi2 platforms. Please see the updated code which removed NDD support for non bmi2 platforms. > src/hotspot/cpu/x86/x86_64.ad line 10401: > >> 10399: %} >> 10400: >> 10401: instruct orI_rReg_imm_rReg_ndd(rRegI dst, immI src1, rRegI src2, rFlagsReg cr) > > It looks to me that we only need one of orI_rReg_rReg_imm_ndd or orI_rReg_imm_rReg_ndd as orI is a commutative operator. After doing a quick test, it was noticed that both the rules (orI_rReg_rReg_imm and orI_rReg_imm_rReg) are needed to generate NDD instruction depending on the position of the immediate value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992013267 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992014445 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992015478 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992016105 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992017266 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992021443 From sparasa at openjdk.org Wed Mar 12 17:58:07 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 12 Mar 2025 17:58:07 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v4] In-Reply-To: References: Message-ID: <3pXUHF0RVN38bsYNRGoiKGcekAxh-G80em4zyT35jEE=.5f2c6fe4-e55a-432e-b04f-278c62e234f0@github.com> On Thu, 6 Mar 2025 01:04:32 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> add flag(PD::...) and clean up loadNKlassCompactHeaders > > src/hotspot/cpu/x86/x86_64.ad line 9744: > >> 9742: instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg cr) >> 9743: %{ >> 9744: predicate(!UseAPX && !VM_Version::supports_bmi2() && n->bottom_type()->basic_type() == T_INT); > > Don't need to change anything for non bmi2 platforms. The original predicate can be kept as is. This applies to all rorI, rolI, rorL, rolL. Please see the updated code removing rorI, rolI, rorL, rolL for non bmi2 platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992022343 From sparasa at openjdk.org Wed Mar 12 18:34:16 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 12 Mar 2025 18:34:16 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v10] In-Reply-To: References: Message-ID: <63osFViER9E_8AGLrVJ5wP6nnkiwqYK0b7EOvBooDFU=.76765dd2-6c58-4bd3-99c1-6ddd75f58ad0@github.com> > The goal of this PR is to generate code using APX NDD instructions. Srinivas Vamsi Parasa has updated the pull request incrementally with two additional commits since the last revision: - restore eorl for RIR - Remove randomly generated test_reg2 for dst= rax test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/cb21e92a..cced83e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=08-09 Stats: 1384 lines in 3 files changed: 213 ins; 200 del; 971 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From kvn at openjdk.org Wed Mar 12 18:34:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 12 Mar 2025 18:34:55 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C [v4] In-Reply-To: References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: <1yR0hHdAPNcepdh0K_QseWxBqfU-YVxSplieQEYDV58=.9173a4bd-4ee1-4e04-a419-3391a82dcda2@github.com> On Tue, 11 Mar 2025 10:56:18 GMT, David Linus Briemann wrote: >> Local benchmarks show good improvements for the crc32c intrinsification: >> >> >> without intrinsic (master): >> >> >> $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 >> offset = 0 >> msgSize = 5120 bytes >> iters = 300000 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(byte[]) runtime = 1.186507782 seconds >> CRC32C.update(byte[]) throughput = 1294.5553525244388 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(ByteBuffer) runtime = 1.355515648 seconds >> CRC32C.update(ByteBuffer) throughput = 1133.1481139788364 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> >> >> with intrinsic: >> >> >> $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 >> offset = 0 >> msgSize = 5120 bytes >> iters = 300000 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(byte[]) runtime = 0.065003188 seconds >> CRC32C.update(byte[]) throughput = 23629.610289267657 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(ByteBuffer) runtime = 0.072310133 seconds >> CRC32C.update(ByteBuffer) throughput = 21241.836189127185 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- > > David Linus Briemann has updated the pull request incrementally with two additional commits since the last revision: > > - fix > - address review comments My testing passed without new failures. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23826#pullrequestreview-2679519375 From kvn at openjdk.org Wed Mar 12 18:40:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 12 Mar 2025 18:40:52 GMT Subject: RFR: 8350840: C2: x64 Assembler::vpcmpeqq assert: failed: XMM register should be 0-15 In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 10:46:39 GMT, Jatin Bhateja wrote: > This bug fix patch addressed an assertion failure due to unexpected register operand encoding. > AVX2 flavour of instruction "vpcmpeqq" expects to operate over XMM registers from lower register bank (0-15), in this case, the register mask associated with the destination vector operand of the matcher pattern also includes registers from the higher bank. > > The issue can be reliably reproduced if we modify the static allocation order of XMM register through AD file change. > Existing bug [JDK-8343294](https://bugs.openjdk.org/browse/JDK-8343294) already tracks the requirement to randomize the allocation ordering. > > Kindly review and share your feedback. > > Best Regards, > Jatin My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23979#pullrequestreview-2679533552 From kvn at openjdk.org Wed Mar 12 19:05:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 12 Mar 2025 19:05:06 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 22:05:13 GMT, Chad Rakoczy wrote: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Few comments. Note, you can use `memcpy` because we don't have nmethod's virtual pointer anymore. src/hotspot/share/code/nmethod.cpp line 1396: > 1394: } > 1395: > 1396: nmethod::nmethod(nmethod& nm) : CodeBlob(nm.name(), CodeBlobKind::Nmethod, nm.size(), nm.header_size()) Should this be `clone()` method instead of constructor. Then you will not need `new()`. src/hotspot/share/code/nmethod.cpp line 1399: > 1397: { > 1398: debug_only(NoSafepointVerifier nsv;) > 1399: assert_locked_or_safepoint(CodeCache_lock); Is this lock enough to prevent GC scan it before you finish initializing it? src/hotspot/share/code/nmethod.cpp line 1406: > 1404: _oop_maps = nm.oop_maps()->clone(); > 1405: } > 1406: _relocation_size = nm._relocation_size; Did you consider to use `memcpy()` and update only changed fields? src/hotspot/share/code/nmethod.cpp line 1514: > 1512: > 1513: // Copy all nmethod data outside of header > 1514: memcpy(content_begin(), nm.content_begin(), nm.size() - nm.header_size()); You would not need it if you `memcpy` whole nmethod. src/hotspot/share/code/nmethod.cpp line 1595: > 1593: } > 1594: > 1595: bool nmethod::is_relocatable() const { Native nmethods should be skipped too. May be also check `is_in_use()`. ------------- PR Review: https://git.openjdk.org/jdk/pull/23573#pullrequestreview-2679552647 PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2718841950 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1992116881 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1992127297 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1992104534 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1992111015 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1992115388 From sparasa at openjdk.org Wed Mar 12 20:29:10 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 12 Mar 2025 20:29:10 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v11] In-Reply-To: References: Message-ID: > The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. > > For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: undo BoxLock node fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/cced83e2..126dd779 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=09-10 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From dlong at openjdk.org Wed Mar 12 20:40:53 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 12 Mar 2025 20:40:53 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> Message-ID: On Wed, 12 Mar 2025 07:55:03 GMT, Damon Fenacci wrote: >> Also, to compute `from_pct`, we end up multiplying and then dividing by the same value `b->_freq`, which cancel out and simplify to `100 * b->succ_prob(j)`. Furthermore, succ_prob() should always return a value between 0.0 and 1.0, so the real problem is probably only `to_pct` and very small values of `target->_freq`. > >> Do you think those high values are not expected ? > > Sorry, my mistake. As @dean-long pointed out they are to be expected with very small values of `target->_freq` I think it would still be helpful to understand what kind of situations cause these extreme values. I know there are places where we have to adjust for problematic 0 counts, so I'm wondering if something like that is happening here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1992256072 From sparasa at openjdk.org Wed Mar 12 20:52:07 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 12 Mar 2025 20:52:07 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v12] In-Reply-To: References: Message-ID: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> > The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. > > For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: undo changes to testing implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/126dd779..51d0e0d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=10-11 Stats: 1381 lines in 2 files changed: 198 ins; 213 del; 970 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From sviswanathan at openjdk.org Wed Mar 12 21:38:55 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 12 Mar 2025 21:38:55 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v12] In-Reply-To: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> References: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> Message-ID: On Wed, 12 Mar 2025 20:52:07 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > undo changes to testing implementation Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23501#pullrequestreview-2679927943 From sparasa at openjdk.org Wed Mar 12 21:43:54 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 12 Mar 2025 21:43:54 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v12] In-Reply-To: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> References: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> Message-ID: On Wed, 12 Mar 2025 20:52:07 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > undo changes to testing implementation Hi Vladimir (@vnkozlov), Could you please review this PR? Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2719196758 From fyang at openjdk.org Thu Mar 13 00:36:56 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 00:36:56 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= [v4] In-Reply-To: References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: On Wed, 12 Mar 2025 03:45:36 GMT, Anjian Wen wrote: >> Add RISCV zfa extension fminm/fmaxm >> This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > delete useless comment Looks good. hotspot:tier1 test good using qemu-system with UseZfa enabled by default. BTW: Similar thing could be done for the float16 variant after https://github.com/openjdk/jdk/pull/23844 is merged. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23509#pullrequestreview-2680156671 From fyang at openjdk.org Thu Mar 13 00:45:55 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 00:45:55 GMT Subject: RFR: 8351861: RISC-V: add simple assert at arrays_equals_v In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 14:48:24 GMT, Hamlin Li wrote: > Hi, > Can you help to review this trivial patch? > > `arrays_equals_v` and `arrays_equals` are 2 versions of the same node `AryEqNode`, input `elem_size` should be the same, so should share the same assert of `elem_size`, this also make the code below more clear. > Although at the same time, we could do the similar thing like https://github.com/openjdk/jdk/pull/24006, but as the code of `arrays_equals_v` and `arrays_equals` seems not require the input must be a byte[] (although in the java level a string's payload is indeed a byte[]), so I'll just leave if as is. > > Thanks Looks fine. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24008#pullrequestreview-2680170884 From fyang at openjdk.org Thu Mar 13 01:17:52 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 01:17:52 GMT Subject: RFR: 8351876: RISC-V: enable and fix some float round tests In-Reply-To: References: Message-ID: <0z-C5lCQadXofnglPHqbIGv5q0VeDexHgUXAYkEC-zc=.9740d7a8-4b2c-4d8c-84fa-db363685d201@github.com> On Wed, 12 Mar 2025 17:01:14 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > It's a follow-up of https://github.com/openjdk/jdk/pull/23985. > > Thanks test/hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java line 57: > 55: @IR(applyIfPlatform = {"riscv64", "true"}, > 56: applyIfCPUFeature = {"rvv", "true"}, > 57: applyIf = {"MaxVectorSize", ">= 32"}, Do you have more details about why a smaller `MaxVectorSize` (Let's say 16) won't work? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24015#discussion_r1992504006 From fyang at openjdk.org Thu Mar 13 01:21:59 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 01:21:59 GMT Subject: RFR: 8351839: RISC-V: Fix base offset calculation introduced in JDK-8347489 In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 13:50:43 GMT, Hamlin Li wrote: > Looks good. Thanks for having a quick look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24006#issuecomment-2719495749 From xgong at openjdk.org Thu Mar 13 01:32:06 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 13 Mar 2025 01:32:06 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 15:11:37 GMT, Emanuel Peter wrote: > Let me run some tests for this, and ping me in 1-2 days :) That's great. Thanks so much for your help @eme64 ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2719511013 From xgong at openjdk.org Thu Mar 13 01:32:08 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 13 Mar 2025 01:32:08 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: References: Message-ID: <2HLHGcIDrIUgqdmHR_B4_xgS-IgkOj0WfFbrjjUPfvI=.8679f435-7017-4d9f-b7c2-6d523bb112ea@github.com> On Wed, 12 Mar 2025 08:04:09 GMT, Hao Sun wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'jdk:master' into JDK_8349522 >> - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations >> >> Since PR [1] has added several new vector operations in VectorAPI >> and the X86 backend implementation for them, this patch adds the >> AArch64 backend part for NEON/SVE architectures. >> >> The performance of Vector API relative jmh micro benchmarks can >> improve about 70x ~ 95x on an AArch64 128-bit vector length sve2 >> architecture with different UseSVE options. Here is the uplift >> details: >> >> ``` >> Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 >> ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x >> ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x >> ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x >> ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x >> ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x >> ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x >> ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x >> ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x >> ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x >> ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x >> ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x >> ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x >> IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x >> IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x >> IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x >> IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x >> IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x >> IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x >> IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x >> IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x ... > > LGTM Thanks a lot for your review @shqking @Bhavana-Kilambi ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2719511425 From xgong at openjdk.org Thu Mar 13 01:33:54 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 13 Mar 2025 01:33:54 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 12:45:09 GMT, Emanuel Peter wrote: > It would also be good to add some IR tests, or possibly modify existing IR rules to reflect that more cases now vectorize, according to `Matcher::match_rule_supported_vector`. Sure. I will add some IR tests. Thanks for looking at this PR! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2719513238 From duke at openjdk.org Thu Mar 13 02:44:36 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 13 Mar 2025 02:44:36 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load Message-ID: WIP. It worked for cases in the TestMergeLoads.java and can observe performance improvement in MergeLoadBench.getIntB . Need to check more cases. ------------- Commit messages: - Remove some debug trace - 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load Changes: https://git.openjdk.org/jdk/pull/24023/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345485 Stats: 957 lines in 14 files changed: 911 ins; 0 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From jbhateja at openjdk.org Thu Mar 13 03:39:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 13 Mar 2025 03:39:57 GMT Subject: RFR: 8350840: C2: x64 Assembler::vpcmpeqq assert: failed: XMM register should be 0-15 In-Reply-To: References: Message-ID: <4O1LhQK-YmDV4g7Wyvad7ge6MYx2NUtpSV4VF1VZPHg=.5d3d94e4-70da-4c38-b437-9106ad154550@github.com> On Wed, 12 Mar 2025 18:37:54 GMT, Vladimir Kozlov wrote: >> This bug fix patch addressed an assertion failure due to unexpected register operand encoding. >> AVX2 flavour of instruction "vpcmpeqq" expects to operate over XMM registers from lower register bank (0-15), in this case, the register mask associated with the destination vector operand of the matcher pattern also includes registers from the higher bank. >> >> The issue can be reliably reproduced if we modify the static allocation order of XMM register through AD file change. >> Existing bug [JDK-8343294](https://bugs.openjdk.org/browse/JDK-8343294) already tracks the requirement to randomize the allocation ordering. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > My testing passed. Thanks @vnkozlov, @sviswa7 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23979#issuecomment-2719749727 From jbhateja at openjdk.org Thu Mar 13 03:39:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 13 Mar 2025 03:39:58 GMT Subject: Integrated: 8350840: C2: x64 Assembler::vpcmpeqq assert: failed: XMM register should be 0-15 In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 10:46:39 GMT, Jatin Bhateja wrote: > This bug fix patch addressed an assertion failure due to unexpected register operand encoding. > AVX2 flavour of instruction "vpcmpeqq" expects to operate over XMM registers from lower register bank (0-15), in this case, the register mask associated with the destination vector operand of the matcher pattern also includes registers from the higher bank. > > The issue can be reliably reproduced if we modify the static allocation order of XMM register through AD file change. > Existing bug [JDK-8343294](https://bugs.openjdk.org/browse/JDK-8343294) already tracks the requirement to randomize the allocation ordering. > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 41cc049f Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/41cc049f425e0b7c90ad3870102366a836eb2209 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod 8350840: C2: x64 Assembler::vpcmpeqq assert: failed: XMM register should be 0-15 Reviewed-by: sviswanathan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/23979 From chagedorn at openjdk.org Thu Mar 13 06:55:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Mar 2025 06:55:01 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor Message-ID: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. ### Unified Logic to Clean Up Parse and Template Assertion Predicates We now use the following algorithm: https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. #### Ignoring Useless Parse Predicates While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. #### Sharing Code for Parse and Template Assertion Predicates With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. - Adding accessor methods to get the Predicate lists from `Compile`. - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. ### Other Included Changes - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could hinder some optimizations. I've added the code now in `Node::clone()` to make sure we do not miss any Template Assertion Predicates (similar to what we do for Parse Predicates already): https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/node.cpp#L514-L516 - Adding verification code to `TemplateAssertionPredicate::is_predicate()` and `InitializedAssertionPredicate::is_predicate()` to verify that when we find an `Opaque*AssertionPredicate` node that we also find the associated `Halt` node. - Some small refactorings here and there like renamings. Thanks, Christian ------------- Commit messages: - Typos etc. - Revert has_halt() verification - cleanup - 8350578: Refactor useless Template Assertion Predicate elimination code Changes: https://git.openjdk.org/jdk/pull/24013/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24013&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350578 Stats: 426 lines in 11 files changed: 234 ins; 139 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/24013.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24013/head:pull/24013 PR: https://git.openjdk.org/jdk/pull/24013 From chagedorn at openjdk.org Thu Mar 13 06:55:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Mar 2025 06:55:05 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor In-Reply-To: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Wed, 12 Mar 2025 16:18:53 GMT, Christian Hagedorn wrote: > This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. > > ### Unified Logic to Clean Up Parse and Template Assertion Predicates > We now use the following algorithm: > https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 > > This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: > > https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 > > We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. > > #### Ignoring Useless Parse Predicates > While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. > > #### Sharing Code for Parse and Template Assertion Predicates > With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: > > - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. > - Adding accessor methods to get the Predicate lists from `Compile`. > - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates > > Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. > > ### Other Included Changes > - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could hinder some optimizations. I've added the code now i... src/hotspot/share/opto/cfgnode.hpp line 508: > 506: void mark_maybe_useful(); > 507: bool is_useful() const; > 508: void mark_useful(); Needed to move these definitions to the source file because I cannot include `predicates.hpp` here due to circular dependencies. I solved this by forward declaring `PredicateState` and moving the definitions to the source file. Same for these methods for `OpaqueTemplateAssertionPredicate` further down. src/hotspot/share/opto/ifnode.cpp line 2220: > 2218: const Type* ParsePredicateNode::Value(PhaseGVN* phase) const { > 2219: assert(_predicate_state != PredicateState::MaybeUseful, "should only be MaybeUseful when eliminating useless " > 2220: "predicates during loop opts"); Best effort assert to ensure we are not seeing `MaybeUseful` anywhere else except during Predicate elimination. Same for `OpaqueTemplateAssertionPredicate`. src/hotspot/share/opto/ifnode.cpp line 2249: > 2247: fatal("unknown kind"); > 2248: } > 2249: if (_predicate_state == PredicateState::Useless) { I only print `useless` since `MaybeUseful` is only set for a very brief moment and should normally not be visible when dumping/in IGV dumps. src/hotspot/share/opto/node.cpp line 516: > 514: if (n->is_OpaqueTemplateAssertionPredicate()) { > 515: C->add_template_assertion_predicate_opaque(n->as_OpaqueTemplateAssertionPredicate()); > 516: } See PR description "Other Included Changes". src/hotspot/share/opto/opaquenode.cpp line 115: > 113: } > 114: > 115: OpaqueTemplateAssertionPredicateNode::OpaqueTemplateAssertionPredicateNode(BoolNode* bol): Node(nullptr, bol), Moved some methods to source file - see comment above for `ParsePredicateNode`. src/hotspot/share/opto/opaquenode.hpp line 30: > 28: #include "opto/node.hpp" > 29: #include "opto/subnode.hpp" > 30: Noticed that `opcodes.hpp` was unused and `subnode.hpp` missed the `opto` prefix. Fixed here as well. src/hotspot/share/opto/predicates.cpp line 177: > 175: bool is_template_assertion_predicate = if_node->in(1)->is_OpaqueTemplateAssertionPredicate(); > 176: assert(!is_template_assertion_predicate || AssertionPredicate::has_halt(maybe_success_proj->as_IfTrue()), > 177: "Template Assertion Predicate must have a Halt Node on the failing path"); New verification - see "Other Included Changes" in PR description. src/hotspot/share/opto/predicates.cpp line 333: > 331: // of the Initialized Assertion Predicate (part of the loop body) while the OpaqueInitializedAssertionPredicate is not > 332: // cloned because it's outside the loop body. We end up sharing the OpaqueInitializedAssertionPredicate between the > 333: // original and the cloned If. This should be fine. Reason for not reusing `AssertionPredicate::has_halt()`. I will revisit the `AssertionPredicate` class in a future PR, so I have not tried to unify these methods in a way. src/hotspot/share/opto/predicates.hpp line 331: > 329: // the ParsePredicateNode is not marked useless. > 330: bool is_valid() const { > 331: return _parse_predicate_node != nullptr && !_parse_predicate_node->is_useless(); Avoids visiting useless Parse Predicates during Predicate iteration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1992876906 PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1992878757 PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1992879213 PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1992879809 PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1992882191 PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1992882542 PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1992881495 PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1992881891 PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1992884443 From epeter at openjdk.org Thu Mar 13 07:14:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Mar 2025 07:14:11 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v12] In-Reply-To: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> References: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> Message-ID: On Wed, 12 Mar 2025 20:52:07 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > undo changes to testing implementation I had a quick look over this. It's a bit hard to review for me, because it is basically about specific APX instructions. We probably have to heavily rely on testing. But APX hardware is not yet available, right? How can be best test this? Is there any way to emulate, maybe using SDE? What testing did you run for this? src/hotspot/cpu/x86/x86_64.ad line 6141: > 6139: match(Set dst (CMoveI (Binary cop cr) (Binary src1 src2))); > 6140: > 6141: ins_cost(200); // XXX What does the XXX stand for? src/hotspot/cpu/x86/x86_64.ad line 6294: > 6292: // Conditional move > 6293: instruct cmovI_rReg_rReg_mem_ndd(rRegI dst, cmpOp cop, rFlagsReg cr, rRegI src1, memory src2) > 6294: %{ Is this bracket usually not on the previous line? ------------- PR Review: https://git.openjdk.org/jdk/pull/23501#pullrequestreview-2680743351 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992899760 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992903548 From duke at openjdk.org Thu Mar 13 07:15:57 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 13 Mar 2025 07:15:57 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v2] In-Reply-To: References: Message-ID: > WIP. > > It worked for cases in the TestMergeLoads.java and can observe performance improvement in MergeLoadBench.getIntB . Need to check more cases. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Fix test failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/3d6c5795..7a1c524d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From epeter at openjdk.org Thu Mar 13 07:23:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Mar 2025 07:23:55 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v12] In-Reply-To: References: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> Message-ID: On Wed, 12 Mar 2025 21:41:32 GMT, Srinivas Vamsi Parasa wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> undo changes to testing implementation > > Hi Vladimir (@vnkozlov), > > Could you please review this PR? > > Thanks, > Vamsi @vamsi-parasa I tried to launch testing, but my script fails because of some merge issue. Would you mind merging from master? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2720211581 From epeter at openjdk.org Thu Mar 13 07:29:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Mar 2025 07:29:07 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v11] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 13:32:40 GMT, Roland Westrelin wrote: >> This change refactors `RShiftI`/`RshiftL` `Ideal`, `Identity` and >> `Value` because the `int` and `long` versions are very similar and so >> there's no logic duplication. In the process, support for some extra >> transformations is added to `RShiftL`. I also added some new test >> cases. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - review + test fix > - review > - Merge branch 'master' into JDK-8349361 > - review > - Merge branch 'master' into JDK-8349361 > - review > - review > - review > - Merge branch 'master' into JDK-8349361 > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Emanuel Peter > - ... and 10 more: https://git.openjdk.org/jdk/compare/8d64f5f3...34e925b3 Did another quick scan. Will launch some testing now. src/hotspot/share/opto/mulnode.cpp line 1412: > 1410: } > 1411: > 1412: Node *RShiftINode::Ideal(PhaseGVN *phase, bool can_reshape) { Suggestion: Node* RShiftINode::Ideal(PhaseGVN* phase, bool can_reshape) { test/hotspot/jtreg/compiler/c2/irTests/RShiftLNodeIdealizationTests.java line 79: > 77: long x9 = Integer.max(Integer.min((int)x, (int)test7Max), (int)(test7Min-1)); > 78: Asserts.assertEQ((x9 << test7Shift) >> test7Shift, test9(x)); > 79: Asserts.assertEQ(((x7 << test7Shift) >> test10Shift), test10(x)); Suggestion: Asserts.assertEQ(((x7 << test7Shift) >> test10Shift), test10(x)); ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23438#pullrequestreview-2680790106 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1992925220 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1992929457 From fjiang at openjdk.org Thu Mar 13 07:34:02 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 13 Mar 2025 07:34:02 GMT Subject: RFR: 8351839: RISC-V: Fix base offset calculation introduced in JDK-8347489 In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 11:24:23 GMT, Fei Yang wrote: > As discussed in https://github.com/openjdk/jdk/pull/23633#discussion_r1974591975, there is no need to distinuish `T_BYTE` and `T_CHAR` when calculating base offset for strings. > The reason is that the low-level character storage used for both Latin1 and UTF16 strings is always a byte array [1]. > So we should always use `T_BYTE` for both cases. This won't make a difference on the calculated base offset for now. > But it's better to fix this for code readability purposes. Sanity tested on linux-riscv64 w/wo COH. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L160 looks good, thanks. ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/24006#pullrequestreview-2680807153 From epeter at openjdk.org Thu Mar 13 07:45:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Mar 2025 07:45:57 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 21:26:41 GMT, Sandhya Viswanathan wrote: >> Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. >> >> Review comments are welcome. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > review comments Looks much better :) You are right `Generators` are missing cases for `short`, `byte`, `char`. You could leave those cases with regular `Random`, but the `int` and `long` cases with `Generators`, to make sure interesing values are added to the mix more frequently. test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 30: > 28: * @summary Test bug fix for JDK-8350835 discovered through Template Framework > 29: * @library /test/lib / > 30: * @run main/othervm compiler.vectorization.TestFloat16ToFloatConv Suggestion: * @run driver compiler.vectorization.TestFloat16ToFloatConv I don't think you need a new VM if you have no additional flags ;) test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 113: > 111: > 112: @Test > 113: // Not vectorized due to JDK-8350835 That's very non-descriptive. Actually, that is the current bug, so this is not even a future RFE that intends to fix it. Can you please say why it is not vectorizing now, and what might be possible conditions when it would be ok to vectorize in the future? Could we even file an RFE for this? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23939#pullrequestreview-2680819988 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1992943827 PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1992942306 From roland at openjdk.org Thu Mar 13 07:56:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 13 Mar 2025 07:56:10 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v12] In-Reply-To: References: Message-ID: > This change refactors `RShiftI`/`RshiftL` `Ideal`, `Identity` and > `Value` because the `int` and `long` versions are very similar and so > there's no logic duplication. In the process, support for some extra > transformations is added to `RShiftL`. I also added some new test > cases. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/c2/irTests/RShiftLNodeIdealizationTests.java Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23438/files - new: https://git.openjdk.org/jdk/pull/23438/files/34e925b3..9c0b859f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23438&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23438&range=10-11 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23438/head:pull/23438 PR: https://git.openjdk.org/jdk/pull/23438 From mli at openjdk.org Thu Mar 13 08:14:56 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 13 Mar 2025 08:14:56 GMT Subject: RFR: 8351876: RISC-V: enable and fix some float round tests In-Reply-To: <0z-C5lCQadXofnglPHqbIGv5q0VeDexHgUXAYkEC-zc=.9740d7a8-4b2c-4d8c-84fa-db363685d201@github.com> References: <0z-C5lCQadXofnglPHqbIGv5q0VeDexHgUXAYkEC-zc=.9740d7a8-4b2c-4d8c-84fa-db363685d201@github.com> Message-ID: On Thu, 13 Mar 2025 01:15:30 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this simple patch? >> It's a follow-up of https://github.com/openjdk/jdk/pull/23985. >> >> Thanks > > test/hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java line 57: > >> 55: @IR(applyIfPlatform = {"riscv64", "true"}, >> 56: applyIfCPUFeature = {"rvv", "true"}, >> 57: applyIf = {"MaxVectorSize", ">= 32"}, > > Do you have more details about why a smaller `MaxVectorSize` (Let's say 16) won't work? Thanks. As when RoundVF/D was implemented (https://github.com/openjdk/jdk/pull/17745), the test show e.g. RoundVF only bring performance gain when vlenb >= 32. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24015#discussion_r1992985778 From mli at openjdk.org Thu Mar 13 08:16:02 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 13 Mar 2025 08:16:02 GMT Subject: Integrated: 8351662: [Test] RISC-V: enable bunch of IR test In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 14:08:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > There are bunch of IR test not enabled on riscv, it's good to enable them, because enabling them will help to: > 1. cover the test gap > 2. find out potential missing intrinsics. > > This patch also changes cpu features from `vm.opt.UseXxx` to `xxx`, as it's easier to find out these tests in the future. > > NOTE: There are still some other test should be enabled, but currently they fail when simply enable them, they will be further investigated, later could be enabled in other PRs with additional code change/implementation if it's feasible. > > Thanks! This pull request has now been integrated. Changeset: 0e7d460e Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/0e7d460e4f95cb0209f9b815fe8c9846de4c9b7e Stats: 247 lines in 32 files changed: 22 ins; 6 del; 219 mod 8351662: [Test] RISC-V: enable bunch of IR test Reviewed-by: fyang, rehn, tonyp ------------- PR: https://git.openjdk.org/jdk/pull/23985 From epeter at openjdk.org Thu Mar 13 08:17:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Mar 2025 08:17:01 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 19:49:41 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > add micro benchmark This looks really interesting! I see that you are doing some special pattern matching. I wonder if it might be worth generalizing the algorithm, to search through an arbitrary "tree" of additions, collect all "leaves" of (`variable * multiplier`), sort by `variable`, and compute new additions for each `variable`. What do you think? src/hotspot/share/opto/addnode.cpp line 407: > 405: } > 406: > 407: // Try to convert a serial of additions into a single multiplication. Also convert `(a * CON) + a` to `(CON + 1) * a` as What about `(a * CON1) + (a * CON2)`? Like `11 * a + 5 * a`. Do we also optimize that? src/hotspot/share/opto/addnode.cpp line 413: > 411: // power-of-2 addition (e.g., 3 * a => (a << 2) + a). Without this check, GVN would keep trying to optimize the same > 412: // node and can't progress. For example, 3 * a => (a << 2) + a => 3 * a => (a << 2) + a => ... > 413: if (find_power_of_two_addition_pattern(this, bt, nullptr) != nullptr) { Where does the optimization `3 * a => (a << 2) + a` happen? Do we use `find_power_of_two_addition_pattern` there too? If not: how do we prevent the code of the two locations from diverging in the future? src/hotspot/share/opto/addnode.cpp line 427: > 425: || find_simple_multiplication_pattern(in1, bt, &multiplier) == in2 > 426: || find_power_of_two_addition_pattern(in1, bt, &multiplier) == in2) { > 427: multiplier++; // +1 for the in2 term Nit: I think we generally have the `||` at the end of the line, not the beginning. src/hotspot/share/opto/addnode.cpp line 447: > 445: > 446: return nullptr; > 447: } I'm not a great fan of "output arguments" such as the `multiplier` here. Why not create a class/struct `Multiplication`, which has a field `valid` (instead of returning `nullptr`). And fields `variable` and `multiplier`. The fields can all be constant. You could even have an `add` method, that adds two such `Multiplication`s together. src/hotspot/share/opto/addnode.cpp line 480: > 478: if (!con->is_Con()) { > 479: swap(con, base); > 480: } Is that necessary? Does `Mul` not automatically get canonicalized so that the constant is on the rhs? src/hotspot/share/opto/addnode.cpp line 500: > 498: // Note that one of the term of the addition could simply be `a` (i.e., a << 0). Calling this function with `multiplier` > 499: // being null is safe. > 500: Node* AddNode::find_power_of_two_addition_pattern(Node* n, BasicType bt, jlong* multiplier) { This code here looks quite complicated. Why not parse both sides of the add with a `find_simple_lshift_pattern`, and then check that they use the same variable? test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 43: > 41: */ > 42: public class TestSerialAdditions { > 43: private static final Random RNG = Utils.getRandomInstance(); Could you use `Generators.java` instead? It will produce more "interesting" random values, such as powers of 2, and values close to powers of 2. ------------- PR Review: https://git.openjdk.org/jdk/pull/23506#pullrequestreview-2680840690 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992974452 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992971118 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992953844 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992966421 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992978164 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992980967 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992984685 From mli at openjdk.org Thu Mar 13 08:17:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 13 Mar 2025 08:17:06 GMT Subject: Integrated: 8351861: RISC-V: add simple assert at arrays_equals_v In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 14:48:24 GMT, Hamlin Li wrote: > Hi, > Can you help to review this trivial patch? > > `arrays_equals_v` and `arrays_equals` are 2 versions of the same node `AryEqNode`, input `elem_size` should be the same, so should share the same assert of `elem_size`, this also make the code below more clear. > Although at the same time, we could do the similar thing like https://github.com/openjdk/jdk/pull/24006, but as the code of `arrays_equals_v` and `arrays_equals` seems not require the input must be a byte[] (although in the java level a string's payload is indeed a byte[]), so I'll just leave if as is. > > Thanks This pull request has now been integrated. Changeset: 6241d096 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/6241d09657fdd2bbd4f02cf6361df8bd07216147 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8351861: RISC-V: add simple assert at arrays_equals_v Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/24008 From fyang at openjdk.org Thu Mar 13 08:19:53 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 08:19:53 GMT Subject: RFR: 8351876: RISC-V: enable and fix some float round tests In-Reply-To: References: <0z-C5lCQadXofnglPHqbIGv5q0VeDexHgUXAYkEC-zc=.9740d7a8-4b2c-4d8c-84fa-db363685d201@github.com> Message-ID: On Thu, 13 Mar 2025 08:11:53 GMT, Hamlin Li wrote: >> test/hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java line 57: >> >>> 55: @IR(applyIfPlatform = {"riscv64", "true"}, >>> 56: applyIfCPUFeature = {"rvv", "true"}, >>> 57: applyIf = {"MaxVectorSize", ">= 32"}, >> >> Do you have more details about why a smaller `MaxVectorSize` (Let's say 16) won't work? Thanks. > > As when RoundVF/D was implemented (https://github.com/openjdk/jdk/pull/17745), the test show e.g. RoundVF only bring performance gain when vlenb >= 32. Ah, I see. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24015#discussion_r1992993563 From fyang at openjdk.org Thu Mar 13 08:24:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 08:24:54 GMT Subject: RFR: 8351876: RISC-V: enable and fix some float round tests In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 17:01:14 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > It's a follow-up of https://github.com/openjdk/jdk/pull/23985. > > Thanks Looks fine to me modulo one minor comment. test/hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java line 56: > 54: counts = {IRNode.ROUND_VF , " > 0 "}) > 55: @IR(applyIfPlatform = {"riscv64", "true"}, > 56: applyIfCPUFeature = {"rvv", "true"}, This `applyIfCPUFeature = {"rvv", "true"},` seems unnecessary? You have already requires `(os.simpleArch == "riscv64" & vm.cpu.features ~= ".*rvv.*")` for this test. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24015#pullrequestreview-2680925360 PR Review Comment: https://git.openjdk.org/jdk/pull/24015#discussion_r1992998478 From mli at openjdk.org Thu Mar 13 08:33:57 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 13 Mar 2025 08:33:57 GMT Subject: RFR: 8351876: RISC-V: enable and fix some float round tests In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:21:05 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this simple patch? >> It's a follow-up of https://github.com/openjdk/jdk/pull/23985. >> >> Thanks > > test/hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java line 56: > >> 54: counts = {IRNode.ROUND_VF , " > 0 "}) >> 55: @IR(applyIfPlatform = {"riscv64", "true"}, >> 56: applyIfCPUFeature = {"rvv", "true"}, > > This `applyIfCPUFeature = {"rvv", "true"},` seems unnecessary? You have already requires `(os.simpleArch == "riscv64" & vm.cpu.features ~= ".*rvv.*")` for this test. Yes, it's unnecessary. But we require `MaxVectorSize >= 32` below, so maybe it's better to keep it for readability. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24015#discussion_r1993014120 From fyang at openjdk.org Thu Mar 13 08:44:52 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 08:44:52 GMT Subject: RFR: 8351876: RISC-V: enable and fix some float round tests In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:31:15 GMT, Hamlin Li wrote: >> test/hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java line 56: >> >>> 54: counts = {IRNode.ROUND_VF , " > 0 "}) >>> 55: @IR(applyIfPlatform = {"riscv64", "true"}, >>> 56: applyIfCPUFeature = {"rvv", "true"}, >> >> This `applyIfCPUFeature = {"rvv", "true"},` seems unnecessary? You have already requires `(os.simpleArch == "riscv64" & vm.cpu.features ~= ".*rvv.*")` for this test. > > Yes, it's unnecessary. > But we require `MaxVectorSize >= 32` below, so maybe it's better to keep it for readability. I guess it's not a big issue. I just find it's not the case when looking at changes made in this file : test/hotspot/jtreg/compiler/vectorization/TestRoundVectRiscv64.java ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24015#discussion_r1993028605 From mli at openjdk.org Thu Mar 13 09:02:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 13 Mar 2025 09:02:08 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 Message-ID: Hi, Can you help to review this simple patch? These client tests seems not that useful to me, so the simple solution could be just disable them on riscv. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24027/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24027&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351902 Stats: 3 lines in 3 files changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24027.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24027/head:pull/24027 PR: https://git.openjdk.org/jdk/pull/24027 From epeter at openjdk.org Thu Mar 13 09:11:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Mar 2025 09:11:54 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: <3YGofYXMD3fLbbCcOOtRLbHp8qLnyTcZ_lY8gOIzT-A=.39d80f33-47a8-4ef4-ab6d-700ee6eaa346@github.com> On Tue, 11 Mar 2025 19:49:41 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > add micro benchmark test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 310: > 308: CON3_L = genL.next(); > 309: CON4_L = genL.next(); > 310: } Is there a reason why you are restricting the values to `powerOfTwoLongs`? I think it would be better if you just take the most general generator. private static final RestrictableGenerator GEN_INT = Generators.G.ints(); private static final RestrictableGenerator GEN_LONG = Generators.G.longs(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1993077366 From rcastanedalo at openjdk.org Thu Mar 13 09:17:51 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 13 Mar 2025 09:17:51 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v2] In-Reply-To: References: Message-ID: <-KsTimAffmqmUXsjG_amR4p6mKAFOHESyCOAcZ8WO0Q=.2116a49a-0094-4af6-a00a-45dbff2da935@github.com> On Wed, 12 Mar 2025 09:55:17 GMT, Aleksey Shipilev wrote: > (If you pull from current master, GHA should become clean) Done (commit 90fd7660), thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2720528943 From rcastanedalo at openjdk.org Thu Mar 13 09:20:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 13 Mar 2025 09:20:52 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v2] In-Reply-To: References: Message-ID: <6DQFvCIhBHpeuVPw9ZZthZzJrwslPnyanFAvH8TdZQQ=.51d9e0da-f1aa-44af-8963-921c6f641b8f@github.com> On Thu, 13 Mar 2025 07:15:57 GMT, kuaiwei wrote: >> WIP. >> >> It worked for cases in the TestMergeLoads.java and can observe performance improvement in MergeLoadBench.getIntB . Need to check more cases. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Fix test failure Hi @kuaiwei, thanks for working on this feature! If this pull request is work in progress as noted in the description, may I suggest switching it back to "Draft" mode until you consider it ready for review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2720538541 From epeter at openjdk.org Thu Mar 13 09:31:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Mar 2025 09:31:57 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 01:18:57 GMT, Xiaohong Gong wrote: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... But the testing on my side so far looks good. I'll rerun once you add your IR tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2720568965 From chagedorn at openjdk.org Thu Mar 13 09:33:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Mar 2025 09:33:59 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v12] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 07:56:10 GMT, Roland Westrelin wrote: >> This change refactors `RShiftI`/`RshiftL` `Ideal`, `Identity` and >> `Value` because the `int` and `long` versions are very similar and so >> there's no logic duplication. In the process, support for some extra >> transformations is added to `RShiftL`. I also added some new test >> cases. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/c2/irTests/RShiftLNodeIdealizationTests.java > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Emanuel Peter Nice refactoring! I have a few small comments - mostly code style. Otherwise, looks good to me, too. src/hotspot/share/opto/mulnode.cpp line 1361: > 1359: if (in(1)->Opcode() == Op_LShift(bt) && > 1360: in(1)->req() == 3 && > 1361: in(1)->in(2) == in(2)) { Generally, is there notifaction code for this pattern to re-add the node to the IGVN worklist? If not, I don't think you need to handle it here if it's missing (it's just a missed opportunity but no correctness issue) but would be good to file a follow-up bug to handle it - especially when we want to add IGVN verification for `Ideal` and `Identity` with [JDK-8347273](https://bugs.openjdk.org/browse/JDK-8347273). src/hotspot/share/opto/mulnode.cpp line 1362: > 1360: in(1)->req() == 3 && > 1361: in(1)->in(2) == in(2)) { > 1362: count &= bits_per_java_integer(bt)-1; // semantics of Java shifts Suggestion: count &= bits_per_java_integer(bt) - 1; // semantics of Java shifts src/hotspot/share/opto/mulnode.cpp line 1365: > 1363: // Compute masks for which this shifting doesn't change > 1364: jlong lo = (-1 << (bits_per_java_integer(bt) - ((uint)count)-1)); // FFFF8000 > 1365: jlong hi = ~lo; // 00007FFF Seems strangely aligned. Maybe either align it to the comment above or convert to an unaligned comment. src/hotspot/share/opto/mulnode.cpp line 1383: > 1381: } > 1382: > 1383: //------------------------------Ideal------------------------------------------ I think you can remove this line Suggestion: src/hotspot/share/opto/mulnode.cpp line 1398: > 1396: // and convert to (x >> 24) & (0xFF000000 >> 24) = x >> 24 > 1397: // Such expressions arise normally from shift chains like (byte)(x >> 24). > 1398: const Node* and_node = in(1); Same as above, do we have notification code for this patterns checking? Same for the patterns in `RShiftINode::Ideal()`. src/hotspot/share/opto/mulnode.cpp line 1476: > 1474: if (t1 == TypeInteger::zero(bt)) return TypeInteger::zero(bt); > 1475: // Shift by zero does nothing > 1476: if (t2 == TypeInt::ZERO) return t1; Can you add braces here for safety? src/hotspot/share/opto/mulnode.cpp line 1490: > 1488: if (!r1->is_con() && r2->is_con()) { > 1489: uint shift = r2->get_con(); > 1490: shift &= bits_per_java_integer(bt)-1; // semantics of Java shifts Suggestion: shift &= bits_per_java_integer(bt) - 1; // semantics of Java shifts src/hotspot/share/opto/mulnode.cpp line 1509: > 1507: #ifdef ASSERT > 1508: // Make sure we get the sign-capture idiom correct. > 1509: if (shift == bits_per_java_integer(bt)-1) { Suggestion: if (shift == bits_per_java_integer(bt) - 1) { src/hotspot/share/opto/mulnode.hpp line 324: > 322: > 323: class RShiftNode : public Node { > 324: public: Suggestion: public: src/hotspot/share/opto/type.cpp line 1702: > 1700: } > 1701: > 1702: const TypeInteger* TypeInteger::make(jlong lo, BasicType bt) { Maybe you want to rename `lo` to `con` since we set `lo == hi`. test/hotspot/jtreg/compiler/c2/irTests/RShiftINodeIdealizationTests.java line 27: > 25: import jdk.test.lib.Asserts; > 26: import compiler.lib.ir_framework.*; > 27: Up to you if you want to update the copyright year or add your company's copyright. Same in the other test. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23438#pullrequestreview-2680958102 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993096261 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993018625 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993089087 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993019577 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993100333 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993016141 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993101084 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993101817 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993103866 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993108234 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993110699 From epeter at openjdk.org Thu Mar 13 09:34:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Mar 2025 09:34:56 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: References: Message-ID: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> On Mon, 10 Mar 2025 03:00:39 GMT, Xiaohong Gong wrote: >> Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures. >> >> The performance of Vector API relative JMH micro benchmarks can improve about 70x ~ 95x on a NVIDIA Grace CPU, which is a 128-bit vector length sve2 architecture, with different UseSVE options. Here is the gain details: >> >> >> Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 >> ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x >> ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x >> ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x >> ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x >> ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x >> ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x >> ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x >> ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x >> ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x >> ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x >> ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x >> ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x >> IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x >> IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x >> IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x >> IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x >> IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x >> IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x >> IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x >> IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x >> IntMaxVector.UMAX 1024 thrpt 30 82.40x 81.85x 82.242x >> IntMaxVector.UMAXMasked 1024 thrpt 30 90.30x 92.10x 92.587x >> IntMaxVector.UMIN 1024 thrpt 30 82.84x 81.43x 82.801x >> IntMaxVector.UMINMasked 1024 thrpt 30 90.43x 91.49x 92.678x >> LongMaxVector.SADD 102... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'jdk:master' into JDK_8349522 > - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations > > Since PR [1] has added several new vector operations in VectorAPI > and the X86 backend implementation for them, this patch adds the > AArch64 backend part for NEON/SVE architectures. > > The performance of Vector API relative jmh micro benchmarks can > improve about 70x ~ 95x on an AArch64 128-bit vector length sve2 > architecture with different UseSVE options. Here is the uplift > details: > > ``` > Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 > ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x > ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x > ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x > ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x > ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x > ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x > ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x > ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x > ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x > ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x > ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x > ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x > IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x > IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x > IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x > IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x > IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x > IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x > IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x > IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x > IntMaxVector.UMAX 1024 thrpt 30 8... I'm getting this failure with `-XX:UseAVX=1` on x64. It is a new test you added. Failed IR Rules (1) of Methods (1) ---------------------------------- 1) Method "public void compiler.vectorapi.VectorSaturatedOperationsTest.susub_masked()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={BEFORE_MATCHING}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx", "true", "asimd", "true"}, counts={"_#V#SATURATING_SUB_VL#_", " >0 ", "unsigned_vector_node", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "Before matching": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(SaturatingSubV.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 2: "unsigned_vector_node" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2720579042 From syan at openjdk.org Thu Mar 13 09:44:53 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 13 Mar 2025 09:44:53 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:57:48 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > These client tests seems not that useful to me, so the simple solution could be just disable them on riscv. > > Thanks! LGTM ------------- Marked as reviewed by syan (Committer). PR Review: https://git.openjdk.org/jdk/pull/24027#pullrequestreview-2681188649 From duke at openjdk.org Thu Mar 13 09:53:53 2025 From: duke at openjdk.org (duke) Date: Thu, 13 Mar 2025 09:53:53 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C [v4] In-Reply-To: References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: On Tue, 11 Mar 2025 10:56:18 GMT, David Linus Briemann wrote: >> Local benchmarks show good improvements for the crc32c intrinsification: >> >> >> without intrinsic (master): >> >> >> $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 >> offset = 0 >> msgSize = 5120 bytes >> iters = 300000 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(byte[]) runtime = 1.186507782 seconds >> CRC32C.update(byte[]) throughput = 1294.5553525244388 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(ByteBuffer) runtime = 1.355515648 seconds >> CRC32C.update(ByteBuffer) throughput = 1133.1481139788364 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> >> >> with intrinsic: >> >> >> $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 >> offset = 0 >> msgSize = 5120 bytes >> iters = 300000 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(byte[]) runtime = 0.065003188 seconds >> CRC32C.update(byte[]) throughput = 23629.610289267657 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(ByteBuffer) runtime = 0.072310133 seconds >> CRC32C.update(ByteBuffer) throughput = 21241.836189127185 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- > > David Linus Briemann has updated the pull request incrementally with two additional commits since the last revision: > > - fix > - address review comments @dbriemann Your change (at version c3eb92d2bc2e336a81d0b39a41cb569b3ca1d0cc) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23826#issuecomment-2720635792 From duke at openjdk.org Thu Mar 13 09:53:53 2025 From: duke at openjdk.org (David Linus Briemann) Date: Thu, 13 Mar 2025 09:53:53 GMT Subject: RFR: 8350866: [x86] Add C1 intrinsics for CRC32-C [v4] In-Reply-To: References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: On Tue, 11 Mar 2025 10:56:18 GMT, David Linus Briemann wrote: >> Local benchmarks show good improvements for the crc32c intrinsification: >> >> >> without intrinsic (master): >> >> >> $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 >> offset = 0 >> msgSize = 5120 bytes >> iters = 300000 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(byte[]) runtime = 1.186507782 seconds >> CRC32C.update(byte[]) throughput = 1294.5553525244388 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(ByteBuffer) runtime = 1.355515648 seconds >> CRC32C.update(ByteBuffer) throughput = 1133.1481139788364 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> >> >> with intrinsic: >> >> >> $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 >> offset = 0 >> msgSize = 5120 bytes >> iters = 300000 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(byte[]) runtime = 0.065003188 seconds >> CRC32C.update(byte[]) throughput = 23629.610289267657 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> CRC32C.update(ByteBuffer) runtime = 0.072310133 seconds >> CRC32C.update(ByteBuffer) throughput = 21241.836189127185 MB/s >> CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 >> ------------------------------------------------------- > > David Linus Briemann has updated the pull request incrementally with two additional commits since the last revision: > > - fix > - address review comments Thanks for reviews and testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23826#issuecomment-2720632466 From xgong at openjdk.org Thu Mar 13 10:00:04 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 13 Mar 2025 10:00:04 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> References: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> Message-ID: On Thu, 13 Mar 2025 09:32:20 GMT, Emanuel Peter wrote: > I'm getting this failure with `-XX:UseAVX=1` on x64. It is a new test you added. > > ``` > Failed IR Rules (1) of Methods (1) > ---------------------------------- > 1) Method "public void compiler.vectorapi.VectorSaturatedOperationsTest.susub_masked()" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={BEFORE_MATCHING}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx", "true", "asimd", "true"}, counts={"_#V#SATURATING_SUB_VL#_", " >0 ", "unsigned_vector_node", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "Before matching": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(SaturatingSubV.*)+(\\s){2}===.*vector[A-Za-z])" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! > * Constraint 2: "unsigned_vector_node" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! > ``` I will take a look at it. Thanks for your reporting! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2720650861 From adinn at openjdk.org Thu Mar 13 10:22:16 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 13 Mar 2025 10:22:16 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 Message-ID: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> This PR reworks the existing AArch64 ML_DSA intrinsic code generator to make it clearer to read and easier to maintain. ------------- Commit messages: - fix whitespace errors - Clearer implementation of AArch64 dilithium generator Changes: https://git.openjdk.org/jdk/pull/24026/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24026&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350589 Stats: 983 lines in 3 files changed: 399 ins; 304 del; 280 mod Patch: https://git.openjdk.org/jdk/pull/24026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24026/head:pull/24026 PR: https://git.openjdk.org/jdk/pull/24026 From adinn at openjdk.org Thu Mar 13 10:22:16 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 13 Mar 2025 10:22:16 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 In-Reply-To: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> Message-ID: <3F6Qa5TvjBGUXvCBukcqJHG4Q4UgoqU5NmP2uMOHQAM=.cd9894a4-1d51-4b4d-9648-d5855d50d97d@github.com> On Thu, 13 Mar 2025 08:57:18 GMT, Andrew Dinn wrote: > This PR reworks the existing AArch64 ML_DSA intrinsic code generator to make it clearer to read and easier to maintain. @ferakocz I have modified your generator code to employ vector sequences and auxiliaries that handle iterative loads, stores and math/logic operations over vector sequences. It would be useful to have a review of the code from you and also for you to test it (see comments below re testing) The rewrite has allowed much of the generator logic to be condensed into calls to simple auxiliaries which provides a better mid-level view of how the code is structured. It has also clarified the register use. I think this will be a lot easier for maintainers to understand. A few further comments: 1. I have added some asserts to the montmul operations to ensure that input and output register sequences are either disjoint or overlapping. There may be further opportunities to add asserts in a follow-up. 2. One thing I noted (commented on in code) after switching to passing vector sequences rather than relying on fixed mappings is that some reloading of q and qinv inside loops is unnecessary as the code in the loop does not write the relevant vectors. I left the code as is so that I could check that the generated code is identical to the original but I will move the relevant load outside the loop before pushing. 3. I compared before and after dissasemblies for the generated code ans it is unchanged modulo routine dilithiumDecomposePoly. For that intrinsic your generator code wrote successive, intermediate results into the next unused set of 4 vectors, which are in most cases used later to hold a non-temporary result needed for a later computation. My code always writes intermediate results into the last set of 4 vectors (which are declared as `VSeq<4> vtmp(20)`). As a result the generated code has the same structure but a slightly different register mapping. I don't believe this affects performance but he change makes it clearer how the computed values are being used. 4. As well as comparing disassemblies for the generated code I verified the patch by running test `jdk/sun/security/provider/acvp/ML_DSA_Test.java`. However, I noted a problem with relying on the test as currently implemented since it dids not appear to capture some errors in my code. I ran the test under the debugger and confirmed that only one of the intrinsics was being exercised (dilithiumAlmostNtt). I confirmed this by adding -XX:+PrintCompilation to the test command line. It seems that all the other calls to intrinsic candidates occurred in the interpreter, not running often enough to trigger compilation of the caller. Instead some of the `impl*Java` methods were being compiled. The last point needs some thinking about. I worked around the limitations of the current test by adding the following compile exclusions to the test on the command line: -XX:CompileCommand=exclude,sun.security.provider.ML_DSA::implDilithiumNttMultJava \ -XX:CompileCommand=exclude,sun.security.provider.ML_DSA::implDilithiumAlmostInverseNttJava \ -XX:CompileCommand=exclude,sun.security.provider.ML_DSA::implDilithiumMontMulByConstantJava \ -XX:CompileCommand=exclude,sun.security.provider.ML_DSA::implDilithiumAlmostNttJava \ -XX:CompileCommand=exclude,sun.security.provider.ML_DSA::decomposePolyJava`dilithiumAlmostNtt This fixes the problem, ensuring that all the intrinsics get exercised. However, there are two problems with modifying the test to pass these options automatically. Firstly, it will slow down the test on ports that don't implement the intrinsics. Secondly, it is only a partial fix -- it won't stop the same problem arising with the other tests launched by the associated test class `Launcher` (e.g. `ML_KEM` tests). Am I simply failing to spot some other test that you ran to verify correctness of the code? If that is not the case then we need to fix the current tests so we can guarantee to exercise the intrinsics. We could supply a new test that exercises the callers often enough to trigger compilation or we could modify the current test to exclude compilation of the Java implementations as appropriate to the architecture. A 3rd, more complex solution would be to modify the interpreter to call out to the generated code when dilithium intrinsics are switched off (as happens with some of the other crypto routines). That might be a useful thing to do anyway, ensuring that callers cannot gain info from a change in timing when we switch from interpreted to compiled code. I'm not sure how significant any timing disparity might be. Perhaps you can advise? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24026#issuecomment-2720710691 From adinn at openjdk.org Thu Mar 13 10:33:42 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 13 Mar 2025 10:33:42 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v2] In-Reply-To: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> Message-ID: > This PR reworks the existing AArch64 ML_DSA intrinsic code generator to make it clearer to read and easier to maintain. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: fix errors in comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24026/files - new: https://git.openjdk.org/jdk/pull/24026/files/cc3aba6a..3faa59be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24026&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24026&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24026/head:pull/24026 PR: https://git.openjdk.org/jdk/pull/24026 From mli at openjdk.org Thu Mar 13 10:47:56 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 13 Mar 2025 10:47:56 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 09:42:28 GMT, SendaoYan wrote: > LGTM Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24027#issuecomment-2720799521 From adinn at openjdk.org Thu Mar 13 10:49:52 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 13 Mar 2025 10:49:52 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v2] In-Reply-To: References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> Message-ID: On Thu, 13 Mar 2025 10:33:42 GMT, Andrew Dinn wrote: >> This PR reworks the existing AArch64 ML_DSA intrinsic code generator to make it clearer to read and easier to maintain. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix errors in comments @ferakocz I see that the test problem is being addressed as part of the x86 ML_DSA PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24026#issuecomment-2720801990 From duke at openjdk.org Thu Mar 13 11:00:04 2025 From: duke at openjdk.org (duke) Date: Thu, 13 Mar 2025 11:00:04 GMT Subject: RFR: 8330469: C2: Remove or change "PrintOpto && VerifyLoopOptimizations" as printing code condition [v3] In-Reply-To: References: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> Message-ID: On Wed, 12 Mar 2025 10:51:43 GMT, Saranya Natarajan wrote: >> **Issue:** There are currently 9 occurrences where we guard printing code with PrintOpto && VerifyLoopOptimizations. This flag combo is never really used in practice. >> >> **Solution**: I analysed the 9 occurrence. In cases, where the flag `PrintOpto && VerifyLoopOptimizations` was followed by flag `TraceLoopOpts` with `else if` or `|| operator` I removed the former flag. In other cases, where `PrintOpto && VerifyLoopOptimizations` was the only flag, I was replaced with `TraceLoopOpts`. >> >> **Test Result**: Link to [GitHub Action](https://github.com/sarannat/jdk/actions/runs/13723071055) run on commit [91ecc51](https://github.com/sarannat/jdk/commit/91ecc5190ce31da94bded4de210136f337286e69) > > Saranya Natarajan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8330469 > - JDK-8330469: Addressing review comments by removing TraceLoopOpts and some dump() > - 8330469 : Removed or replaced (PrintOpto && VerifyLoopOptimizations) with TraceLoopOpts @sarannat Your change (at version 3ee2e887d9e03f549b8acfc18562e919387791cc) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23959#issuecomment-2720835108 From duke at openjdk.org Thu Mar 13 11:05:12 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Thu, 13 Mar 2025 11:05:12 GMT Subject: Integrated: 8330469: C2: Remove or change "PrintOpto && VerifyLoopOptimizations" as printing code condition In-Reply-To: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> References: <0u5jTfXdZR63O78QixBJEql2WKWLvcyaeqsN_gAL3OA=.40f718c3-b99e-4183-80e1-fb3b5dc6074e@github.com> Message-ID: On Mon, 10 Mar 2025 10:34:19 GMT, Saranya Natarajan wrote: > **Issue:** There are currently 9 occurrences where we guard printing code with PrintOpto && VerifyLoopOptimizations. This flag combo is never really used in practice. > > **Solution**: I analysed the 9 occurrence. In cases, where the flag `PrintOpto && VerifyLoopOptimizations` was followed by flag `TraceLoopOpts` with `else if` or `|| operator` I removed the former flag. In other cases, where `PrintOpto && VerifyLoopOptimizations` was the only flag, I was replaced with `TraceLoopOpts`. > > **Test Result**: Link to [GitHub Action](https://github.com/sarannat/jdk/actions/runs/13723071055) run on commit [91ecc51](https://github.com/sarannat/jdk/commit/91ecc5190ce31da94bded4de210136f337286e69) This pull request has now been integrated. Changeset: 9c003314 Author: Saranya Natarajan Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/9c00331465fe83e491f6dd1e6df4df1fb790f2fc Stats: 38 lines in 3 files changed: 0 ins; 34 del; 4 mod 8330469: C2: Remove or change "PrintOpto && VerifyLoopOptimizations" as printing code condition Reviewed-by: chagedorn, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/23959 From jbhateja at openjdk.org Thu Mar 13 11:06:12 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 13 Mar 2025 11:06:12 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v12] In-Reply-To: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> References: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> Message-ID: <1yyFp-w0337hN271pnv4E5uk7bF-u7coXGTDOrdQyB4=.f4cb21ae-a1f2-43bd-88ed-4697c5a23ca4@github.com> On Wed, 12 Mar 2025 20:52:07 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > undo changes to testing implementation Hi @vamsi-parasa , some minor comments. Otherwise patch looks good to me. src/hotspot/cpu/x86/assembler_x86.cpp line 339: > 337: assert(op1 == 0x81, "Unexpected opcode"); > 338: if (is8bit(imm32)) { > 339: emit_int24(op1 | 0x02, // set sign bit Hi @vamsi-parasa, This is a nice code cache-friendly optimization for immediate operands arithmetic instruction. If the immediate value range fits within the 8-bit immediate value, we replace the instruction with an equivalent instruction with shorter encoding. Please add a few comments to it. We save 3 bytes per encoding, which offsets the EVEX encoding penalty. src/hotspot/cpu/x86/x86_64.ad line 1550: > 1548: { > 1549: int offset = ra_->reg2offset(in_RegMask(0).find_first_elem()); > 1550: int reg = ra_->get_encode(this); Please remove this line, reg is not being used. src/hotspot/cpu/x86/x86_64.ad line 6274: > 6272: ins_encode %{ > 6273: __ ecmovl(Assembler::parity, $dst$$Register, $src1$$Register, $src2$$Register); > 6274: __ ecmovl(Assembler::notEqual, $dst$$Register, $src1$$Register, $src2$$Register); FTR, we need this special handling for unorder check fixup (UCF) for only eq / neq comparison since NaN == NaN is false, and parity flag bit is set if any of the operand is a NaN. ------------- PR Review: https://git.openjdk.org/jdk/pull/23501#pullrequestreview-2680914108 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1993243015 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1992992553 PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r1993001495 From jbhateja at openjdk.org Thu Mar 13 11:09:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 13 Mar 2025 11:09:54 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v12] In-Reply-To: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> References: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> Message-ID: On Wed, 12 Mar 2025 20:52:07 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > undo changes to testing implementation We also need to create a follow-up PR for Extended EVEX to REX2/REX demotions, this will require changes in the assembler layer and the compiler. I have already created one JBS for the RA side of the work. https://bugs.openjdk.org/browse/JDK-8351016 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2720863576 From duke at openjdk.org Thu Mar 13 11:18:10 2025 From: duke at openjdk.org (David Linus Briemann) Date: Thu, 13 Mar 2025 11:18:10 GMT Subject: Integrated: 8350866: [x86] Add C1 intrinsics for CRC32-C In-Reply-To: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> References: <-01lK62KkjB6U0yx64PznUfEnGn6W649vuiz1Mp3NLU=.8acc8f2b-907c-4fed-bdb2-a1630c1eb824@github.com> Message-ID: On Thu, 27 Feb 2025 14:30:42 GMT, David Linus Briemann wrote: > Local benchmarks show good improvements for the crc32c intrinsification: > > > without intrinsic (master): > > > $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 > offset = 0 > msgSize = 5120 bytes > iters = 300000 > ------------------------------------------------------- > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > CRC32C.update(byte[]) runtime = 1.186507782 seconds > CRC32C.update(byte[]) throughput = 1294.5553525244388 MB/s > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > ------------------------------------------------------- > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > CRC32C.update(ByteBuffer) runtime = 1.355515648 seconds > CRC32C.update(ByteBuffer) throughput = 1133.1481139788364 MB/s > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > ------------------------------------------------------- > > > with intrinsic: > > > $JDK/java -DmsgSize=5120 -XX:TieredStopAtLevel=1 -Xcomp TestCRC32C 300000 > offset = 0 > msgSize = 5120 bytes > iters = 300000 > ------------------------------------------------------- > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > CRC32C.update(byte[]) runtime = 0.065003188 seconds > CRC32C.update(byte[]) throughput = 23629.610289267657 MB/s > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > ------------------------------------------------------- > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > CRC32C.update(ByteBuffer) runtime = 0.072310133 seconds > CRC32C.update(ByteBuffer) throughput = 21241.836189127185 MB/s > CRCs: crc = 0cbca9c8, crcReference = 0cbca9c8 > ------------------------------------------------------- This pull request has now been integrated. Changeset: 4c5956d7 Author: David Linus Briemann Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/4c5956d7481e043c35f5dc78f095516288a00a2e Stats: 71 lines in 3 files changed: 65 ins; 0 del; 6 mod 8350866: [x86] Add C1 intrinsics for CRC32-C Reviewed-by: mdoerr, kvn ------------- PR: https://git.openjdk.org/jdk/pull/23826 From fyang at openjdk.org Thu Mar 13 12:08:57 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 12:08:57 GMT Subject: RFR: 8351839: RISC-V: Fix base offset calculation introduced in JDK-8347489 In-Reply-To: References: Message-ID: <4efrefaKiusbGLOaspFCggcDCT8mE4HXRoycDLrJpP4=.80fcb15b-8303-4255-8c3f-a6d04fef059a@github.com> On Thu, 13 Mar 2025 07:31:22 GMT, Feilong Jiang wrote: > looks good, thanks. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24006#issuecomment-2721026759 From fyang at openjdk.org Thu Mar 13 12:08:58 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 12:08:58 GMT Subject: Integrated: 8351839: RISC-V: Fix base offset calculation introduced in JDK-8347489 In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 11:24:23 GMT, Fei Yang wrote: > As discussed in https://github.com/openjdk/jdk/pull/23633#discussion_r1974591975, there is no need to distinuish `T_BYTE` and `T_CHAR` when calculating base offset for strings. > The reason is that the low-level character storage used for both Latin1 and UTF16 strings is always a byte array [1]. > So we should always use `T_BYTE` for both cases. This won't make a difference on the calculated base offset for now. > But it's better to fix this for code readability purposes. Sanity tested on linux-riscv64 w/wo COH. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L160 This pull request has now been integrated. Changeset: 375722f4 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/375722f4ab62865c45d8d76f01dc9c7209be57c8 Stats: 14 lines in 2 files changed: 0 ins; 6 del; 8 mod 8351839: RISC-V: Fix base offset calculation introduced in JDK-8347489 Reviewed-by: mli, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/24006 From chagedorn at openjdk.org Thu Mar 13 12:17:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Mar 2025 12:17:23 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command Message-ID: We currently only print a compilation bailout with `-XX:+PrintCompilation`: 7782 90 b 4 Test::main (19 bytes) 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. Thanks, Christian ------------- Commit messages: - 8351938: C2: Print compilation bailouts with PrintCompilation compile command Changes: https://git.openjdk.org/jdk/pull/24031/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24031&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351938 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24031.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24031/head:pull/24031 PR: https://git.openjdk.org/jdk/pull/24031 From thartmann at openjdk.org Thu Mar 13 12:17:24 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 13 Mar 2025 12:17:24 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: <42Qy49mFH-f2v13CWWlt7g2wNrITP-9ws-dqIZgJ3V0=.fdc2835e-6f8b-4a92-a2eb-21a1f72683e3@github.com> On Thu, 13 Mar 2025 12:07:50 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > Thanks, > Christian Looks good. I'll close [JDK-8318890](https://bugs.openjdk.org/browse/JDK-8318890) as dup then ;) ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24031#pullrequestreview-2681660792 From epeter at openjdk.org Thu Mar 13 12:17:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Mar 2025 12:17:23 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 12:07:50 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > Thanks, > Christian Perfect, this has been bothering me for a while too! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24031#pullrequestreview-2681653976 From chagedorn at openjdk.org Thu Mar 13 12:21:03 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Mar 2025 12:21:03 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 12:07:50 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > Thanks, > Christian Thanks for your reviews! > I'll close [JDK-8318890](https://bugs.openjdk.org/browse/JDK-8318890) as dup then ;) Sounds good, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24031#issuecomment-2721062717 From fyang at openjdk.org Thu Mar 13 12:22:52 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Mar 2025 12:22:52 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:57:48 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > These client tests seems not that useful to me, so the simple solution could be just disable them on riscv. > > Thanks! Hi, Thanks for taking care of this. I have got a question here. Do you know why other tests (Let's say `TestUseSHA256IntrinsicsOptionOnSupportedCPU.java` & `TestUseSHA512IntrinsicsOptionOnSupportedCPU.java`) in the same directory are not effected? They look quite similar to these ones modified in this PR. But they pass on my linux-riscv64 platform where there is no support for SHA256/512. Interesting! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24027#issuecomment-2721060080 From rcastanedalo at openjdk.org Thu Mar 13 12:34:00 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 13 Mar 2025 12:34:00 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> Message-ID: On Wed, 12 Mar 2025 14:43:15 GMT, Quan Anh Mai wrote: > > A store of a char value into a long[] array would be represented at the IR level as a conversion (ConvI2L) followed by a StoreL, no? > > No, a code such as this `MemorySegment.ofArray(longArray).set(ValueLayout.JAVA_SHORT, offset, c)` would produce a `StoreC` into a `long[]`. Right, in this case I interpret from the [comment at the declaration of `MemNode::memory_type()`](https://github.com/openjdk/jdk/blob/375722f4ab62865c45d8d76f01dc9c7209be57c8/src/hotspot/share/opto/memnode.hpp#L136) that the `memory_type()` of the StoreC node should be `T_SHORT` (the type of the value stored by the node), as opposed to the current `T_CHAR`. I propose to address this in a separate RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2721096572 From duke at openjdk.org Thu Mar 13 12:36:56 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 13 Mar 2025 12:36:56 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 In-Reply-To: <3F6Qa5TvjBGUXvCBukcqJHG4Q4UgoqU5NmP2uMOHQAM=.cd9894a4-1d51-4b4d-9648-d5855d50d97d@github.com> References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> <3F6Qa5TvjBGUXvCBukcqJHG4Q4UgoqU5NmP2uMOHQAM=.cd9894a4-1d51-4b4d-9648-d5855d50d97d@github.com> Message-ID: <7A298xkGx3GJ2Pt6yIEK_DABrosqrbpYXH905hVUyxc=.a7555ea3-ef86-4bd5-a569-c1757ac6f1ab@github.com> On Thu, 13 Mar 2025 10:15:49 GMT, Andrew Dinn wrote: > @ferakocz I have modified your generator code to employ vector sequences and auxiliaries that handle iterative loads, stores and math/logic operations over vector sequences. It would be useful to have a review of the code from you and also for you to test it (see comments below re testing) The rewrite has allowed much of the generator logic to be condensed into calls to simple auxiliaries which provides a better mid-level view of how the code is structured. It has also clarified the register use. I think this will be a lot easier for maintainers to understand. A few further comments: > > 1. I have added some asserts to the montmul operations to ensure that input and output register sequences are either disjoint or overlapping. There may be further opportunities to add asserts in a follow-up. > 2. One thing I noted (commented on in code) after switching to passing vector sequences rather than relying on fixed mappings is that some reloading of q and qinv inside loops is unnecessary as the code in the loop does not write the relevant vectors. I left the code as is so that I could check that the generated code is identical to the original but I will move the relevant load outside the loop before pushing. > 3. I compared before and after dissasemblies of the generated code and it is unchanged modulo routine `dilithiumDecomposePoly`. For that intrinsic your generator code wrote successive, intermediate results into the next unused set of 4 vectors, which are in most cases used subsequently to hold a non-temporary result needed by a later computation. My code always writes intermediate results into the last set of 4 vectors (declared as `VSeq<4> vtmp(20)`). As a result my generated code has the same structure but a slightly different register mapping to yours. I don't believe this affects performance but the change do make it clearer how the computed values are being used. > 4. As well as comparing disassemblies for the generated code I verified the patch by running test `jdk/sun/security/provider/acvp/ML_DSA_Test.java`. However, I noted a problem with relying on the test as currently implemented since it did not appear to capture some errors in my code. I re-ran the test under the debugger and found that only one of the intrinsics was being exercised (dilithiumAlmostNtt). I confirmed this by adding -XX:+PrintCompilation to the test command line. It seems that all the calls to other intrinsic candidates occurred from the interpreter and did not run often enough to trigger compilation of the caller. Instead some of the `impl*Java` methods were being compiled. > > The last point needs some thinking about. I worked around the limitations of the current test by adding the following compile exclusions to the test on the command line: > > ``` > -XX:CompileCommand=exclude,sun.security.provider.ML_DSA::implDilithiumNttMultJava \ > -XX:CompileCommand=exclude,sun.security.provider.ML_DSA::implDilithiumAlmostInverseNttJava \ > -XX:CompileCommand=exclude,sun.security.provider.ML_DSA::implDilithiumMontMulByConstantJava \ > -XX:CompileCommand=exclude,sun.security.provider.ML_DSA::implDilithiumAlmostNttJava \ > -XX:CompileCommand=exclude,sun.security.provider.ML_DSA::decomposePolyJava`dilithiumAlmostNtt > ``` > > This fixes the immediate problem, ensuring that all the intrinsics get exercised. However, there are two issues with modifying the test to pass these options automatically. Firstly, it will slow down the test on ports that don't implement the intrinsics. Secondly, it is only a partial fix -- it won't stop the same problem arising with the other tests launched by the associated test class `Launcher` (e.g. `ML_KEM` tests). > > Am I simply failing to spot some other test that you ran to verify correctness of the code? If that is not the case then we need to fix the current tests so we can guarantee to exercise the intrinsics. We could supply a new test that exercises the callers often enough to trigger compilation or we could modify the current test to exclude compilation of the Java implementations as appropriate to the architecture. A 3rd, more complex solution would be to modify the interpreter to call out to the generated code when dilithium intrinsics are switched off (as happens with some of the other crypto routines). That might be a useful thing to do anyway, ensuring that callers cannot gain info from a change in timing when we switch from interpreted to compiled code. I'm not sure how significant any timing disparity might be. Perhaps you can advise? Hi, @adinn , I have another PR for ML-DSA AVX-512 intrinsics, in which I am changing the test jdk/sun/security/provider/acvp/Launcher.java (that calls ML_DSA_Test.run() adding -Xcomp to the command line and that results in invoking the intrinsics on the first call. That PR is https://github.com/openjdk/jdk/pull/23860 . ------------- PR Comment: https://git.openjdk.org/jdk/pull/24026#issuecomment-2721099943 From duke at openjdk.org Thu Mar 13 12:36:57 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 13 Mar 2025 12:36:57 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 In-Reply-To: <7A298xkGx3GJ2Pt6yIEK_DABrosqrbpYXH905hVUyxc=.a7555ea3-ef86-4bd5-a569-c1757ac6f1ab@github.com> References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> <3F6Qa5TvjBGUXvCBukcqJHG4Q4UgoqU5NmP2uMOHQAM=.cd9894a4-1d51-4b4d-9648-d5855d50d97d@github.com> <7A298xkGx3GJ2Pt6yIEK_DABrosqrbpYXH905hVUyxc=.a7555ea3-ef86-4bd5-a569-c1757ac6f1ab@github.com> Message-ID: On Thu, 13 Mar 2025 12:32:29 GMT, Ferenc Rakoczi wrote: >> @ferakocz I have modified your generator code to employ vector sequences and auxiliaries that handle iterative loads, stores and math/logic operations over vector sequences. It would be useful to have a review of the code from you and also for you to test it (see comments below re testing) >> The rewrite has allowed much of the generator logic to be condensed into calls to simple auxiliaries which provides a better mid-level view of how the code is structured. It has also clarified the register use. I think this will be a lot easier for maintainers to understand. >> A few further comments: >> 1. I have added some asserts to the montmul operations to ensure that input and output register sequences are either disjoint or overlapping. There may be further opportunities to add asserts in a follow-up. >> 2. One thing I noted (commented on in code) after switching to passing vector sequences rather than relying on fixed mappings is that some reloading of q and qinv inside loops is unnecessary as the code in the loop does not write the relevant vectors. I left the code as is so that I could check that the generated code is identical to the original but I will move the relevant load outside the loop before pushing. >> 3. I compared before and after dissasemblies of the generated code and it is unchanged modulo routine `dilithiumDecomposePoly`. For that intrinsic your generator code wrote successive, intermediate results into the next unused set of 4 vectors, which are in most cases used subsequently to hold a non-temporary result needed by a later computation. My code always writes intermediate results into the last set of 4 vectors (declared as `VSeq<4> vtmp(20)`). As a result my generated code has the same structure but a slightly different register mapping to yours. I don't believe this affects performance but the change do make it clearer how the computed values are being used. >> 4. As well as comparing disassemblies for the generated code I verified the patch by running test `jdk/sun/security/provider/acvp/ML_DSA_Test.java`. However, I noted a problem with relying on the test as currently implemented since it did not appear to capture some errors in my code. I re-ran the test under the debugger and found that only one of the intrinsics was being exercised (dilithiumAlmostNtt). I confirmed this by adding -XX:+PrintCompilation to the test command line. It seems that all the calls to other intrinsic candidates occurred from the interpreter and did not run often eno... > >> @ferakocz I have modified your generator code to employ vector sequences and auxiliaries that handle iterative loads, stores and math/logic operations over vector sequences. It would be useful to have a review of the code from you and also for you to test it (see comments below re testing) The rewrite has allowed much of the generator logic to be condensed into calls to simple auxiliaries which provides a better mid-level view of how the code is structured. It has also clarified the register use. I think this will be a lot easier for maintainers to understand. A few further comments: >> >> 1. I have added some asserts to the montmul operations to ensure that input and output register sequences are either disjoint or overlapping. There may be further opportunities to add asserts in a follow-up. >> 2. One thing I noted (commented on in code) after switching to passing vector sequences rather than relying on fixed mappings is that some reloading of q and qinv inside loops is unnecessary as the code in the loop does not write the relevant vectors. I left the code as is so that I could check that the generated code is identical to the original but I will move the relevant load outside the loop before pushing. >> 3. I compared before and after dissasemblies of the generated code and it is unchanged modulo routine `dilithiumDecomposePoly`. For that intrinsic your generator code wrote successive, intermediate results into the next unused set of 4 vectors, which are in most cases used subsequently to hold a non-temporary result needed by a later computation. My code always writes intermediate results into the last set of 4 vectors (declared as `VSeq<4> vtmp(20)`). As a result my generated code has the same structure but a slightly different register mapping to yours. I don't believe this affects performance but the change do make it clearer how the computed values are being used. >> 4. As well as comparing disassemblies for the generated code I verified the patch by running test `jdk/sun/security/provider/acvp/ML_DSA_Test.java`. However, I noted a problem with relying on the test as currently implemented since it did not appear to capture some errors in my code. I re-ran the test under the debugger and found that only one of the intrinsics was being exercised (dilithiumAlmostNtt). I confirmed this by adding -XX:+PrintCompilation to the test command line. It seems that all the calls to other intrinsic candidates occurred from the interpreter and did not run often enou... > @ferakocz I see that the test problem is being addressed as part of the x86 ML_DSA PR. Oh, so you have already found that :-) . ------------- PR Comment: https://git.openjdk.org/jdk/pull/24026#issuecomment-2721105194 From hgreule at openjdk.org Thu Mar 13 13:19:41 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 13 Mar 2025 13:19:41 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 12:34:36 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - use Generators >> - after merge cleanup >> - Merge branch 'master' into involution-nodes >> - collapse impl, add more fitting nodes >> - test > > test/hotspot/jtreg/compiler/c2/irTests/InvolutionIdentityTests.java line 83: > >> 81: assertResultF(nanf); >> 82: >> 83: double ad = RunInfo.getRandom().nextDouble(); > > This actually only generates values between `0.0...1.0`. > > Can you instead use `Generators.java`? It will make sure to generate "interesting" values, including different encodings of `NaN`, infinity, etc. Good catch, I'm using Generators now. I'm currently not testing ReverseBytesS/US, but there also isn't support for short/char in Generators. Should I still add tests? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23851#discussion_r1993500894 From hgreule at openjdk.org Thu Mar 13 13:19:41 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 13 Mar 2025 13:19:41 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v2] In-Reply-To: References: Message-ID: > subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. > > For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: > - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway > - For other values, the msb is flipped, flipping twice results in the original value again. > > Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. > > One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like > > int v = 0; > for (int datum : data) { > v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); > } > return v; > > was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - use Generators - after merge cleanup - Merge branch 'master' into involution-nodes - collapse impl, add more fitting nodes - test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23851/files - new: https://git.openjdk.org/jdk/pull/23851/files/41c555cd..b82ed237 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23851&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23851&range=00-01 Stats: 65786 lines in 1301 files changed: 29325 ins; 24767 del; 11694 mod Patch: https://git.openjdk.org/jdk/pull/23851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23851/head:pull/23851 PR: https://git.openjdk.org/jdk/pull/23851 From epeter at openjdk.org Thu Mar 13 13:23:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Mar 2025 13:23:54 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v2] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 13:16:31 GMT, Hannes Greule wrote: >> test/hotspot/jtreg/compiler/c2/irTests/InvolutionIdentityTests.java line 83: >> >>> 81: assertResultF(nanf); >>> 82: >>> 83: double ad = RunInfo.getRandom().nextDouble(); >> >> This actually only generates values between `0.0...1.0`. >> >> Can you instead use `Generators.java`? It will make sure to generate "interesting" values, including different encodings of `NaN`, infinity, etc. > > Good catch, I'm using Generators now. I'm currently not testing ReverseBytesS/US, but there also isn't support for short/char in Generators. Should I still add tests? You can always restrict ranges: private static final RestrictableGenerator GEN_BYTE = Generators.G.safeRestrict(Generators.G.ints(), Byte.MIN_VALUE, Byte.MAX_VALUE); private static final RestrictableGenerator GEN_CHAR = Generators.G.safeRestrict(Generators.G.ints(), Character.MIN_VALUE, Character.MAX_VALUE); private static final RestrictableGenerator GEN_SHORT = Generators.G.safeRestrict(Generators.G.ints(), Short.MIN_VALUE, Short.MAX_VALUE); private static final RestrictableGenerator GEN_INT = Generators.G.ints(); private static final RestrictableGenerator GEN_LONG = Generators.G.longs(); private static final Generator GEN_FLOAT = Generators.G.floats(); private static final Generator GEN_DOUBLE = Generators.G.doubles(); private static final RestrictableGenerator GEN_BOOLEAN = Generators.G.safeRestrict(Generators.G.ints(), 0, 1); > I'm currently not testing ReverseBytesS/US, but there also isn't support for short/char in Generators. Should I still add tests? And yes, more tests would always be better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23851#discussion_r1993509063 From eastigeevich at openjdk.org Thu Mar 13 13:57:06 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 13 Mar 2025 13:57:06 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 18:54:10 GMT, Vladimir Kozlov wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > src/hotspot/share/code/nmethod.cpp line 1396: > >> 1394: } >> 1395: >> 1396: nmethod::nmethod(nmethod& nm) : CodeBlob(nm.name(), CodeBlobKind::Nmethod, nm.size(), nm.header_size()) > > Should this be `clone()` method instead of constructor. Then you will not need `new()`. +1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1993581509 From eastigeevich at openjdk.org Thu Mar 13 13:57:08 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 13 Mar 2025 13:57:08 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: <3EIwREkAHbNJoOGW6c9NKb9f3_yqFlPrIQ6ppKC7zf0=.32d943bb-bec2-4e64-b941-721872548a28@github.com> On Tue, 11 Feb 2025 22:05:13 GMT, Chad Rakoczy wrote: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. src/hotspot/share/oops/method.hpp line 376: > 374: > 375: public: > 376: static void set_code(const methodHandle& mh, nmethod* code, bool isRelocation); I see you have added `isRelocation` to pass the assert. We don't need it because we don't have plans to relocate compiled-to-native wrappers at the moment. We only support relocation of compiled code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1993577593 From hgreule at openjdk.org Thu Mar 13 14:00:27 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 13 Mar 2025 14:00:27 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v3] In-Reply-To: References: Message-ID: > subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. > > For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: > - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway > - For other values, the msb is flipped, flipping twice results in the original value again. > > Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. > > One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like > > int v = 0; > for (int datum : data) { > v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); > } > return v; > > was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: add tests for ReverseBytesS/ReverseBytesUS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23851/files - new: https://git.openjdk.org/jdk/pull/23851/files/b82ed237..0a48b5b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23851&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23851&range=01-02 Stats: 58 lines in 1 file changed: 54 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23851/head:pull/23851 PR: https://git.openjdk.org/jdk/pull/23851 From hgreule at openjdk.org Thu Mar 13 14:00:27 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 13 Mar 2025 14:00:27 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v3] In-Reply-To: References: Message-ID: <8C9sjYGvmzGH7JdwHSRsFg8gJfdmk_Mu3z63Avvyj2g=.7ba67b9a-cf43-4453-84d8-c7c15c37963f@github.com> On Thu, 13 Mar 2025 13:20:57 GMT, Emanuel Peter wrote: >> Good catch, I'm using Generators now. I'm currently not testing ReverseBytesS/US, but there also isn't support for short/char in Generators. Should I still add tests? > > You can always restrict ranges: > > private static final RestrictableGenerator GEN_BYTE = Generators.G.safeRestrict(Generators.G.ints(), Byte.MIN_VALUE, Byte.MAX_VALUE); > private static final RestrictableGenerator GEN_CHAR = Generators.G.safeRestrict(Generators.G.ints(), Character.MIN_VALUE, Character.MAX_VALUE); > private static final RestrictableGenerator GEN_SHORT = Generators.G.safeRestrict(Generators.G.ints(), Short.MIN_VALUE, Short.MAX_VALUE); > private static final RestrictableGenerator GEN_INT = Generators.G.ints(); > private static final RestrictableGenerator GEN_LONG = Generators.G.longs(); > private static final Generator GEN_FLOAT = Generators.G.floats(); > private static final Generator GEN_DOUBLE = Generators.G.doubles(); > private static final RestrictableGenerator GEN_BOOLEAN = Generators.G.safeRestrict(Generators.G.ints(), 0, 1); > > >> I'm currently not testing ReverseBytesS/US, but there also isn't support for short/char in Generators. Should I still add tests? > > And yes, more tests would always be better. Thanks, I added test cases for short and char. Interestingly, there are no reverse (bits) methods for those types. Please let me know if there's anything more I can do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23851#discussion_r1993587660 From roland at openjdk.org Thu Mar 13 14:03:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 13 Mar 2025 14:03:46 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v13] In-Reply-To: References: Message-ID: <8KD12Vx0RwbigLBDr3G8b4xdVJtu0Rlr-ioSPO4S6C0=.3764952c-966e-4757-83ff-fed58ca5555c@github.com> > This change refactors `RShiftI`/`RshiftL` `Ideal`, `Identity` and > `Value` because the `int` and `long` versions are very similar and so > there's no logic duplication. In the process, support for some extra > transformations is added to `RShiftL`. I also added some new test > cases. Roland Westrelin has updated the pull request incrementally with five additional commits since the last revision: - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/mulnode.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23438/files - new: https://git.openjdk.org/jdk/pull/23438/files/9c0b859f..1911ca6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23438&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23438&range=11-12 Stats: 5 lines in 2 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23438/head:pull/23438 PR: https://git.openjdk.org/jdk/pull/23438 From galder at openjdk.org Thu Mar 13 14:44:59 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 13 Mar 2025 14:44:59 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:13:55 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > @rwestrel @galderz Are you two still working on this or is it ready for someone else to review? @eme64 My review was superficial ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2721511698 From adinn at openjdk.org Thu Mar 13 15:06:09 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 13 Mar 2025 15:06:09 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v3] In-Reply-To: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> Message-ID: > This PR reworks the existing AArch64 ML_DSA intrinsic code generator to make it clearer to read and easier to maintain. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: fix invalid register argument ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24026/files - new: https://git.openjdk.org/jdk/pull/24026/files/3faa59be..366c8159 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24026&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24026&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24026/head:pull/24026 PR: https://git.openjdk.org/jdk/pull/24026 From adinn at openjdk.org Thu Mar 13 15:17:29 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 13 Mar 2025 15:17:29 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v4] In-Reply-To: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> Message-ID: > This PR reworks the existing AArch64 ML_DSA intrinsic code generator to make it clearer to read and easier to maintain. Andrew Dinn has updated the pull request incrementally with two additional commits since the last revision: - use references and const to avoid VSeq copying and fix int array arg issue - fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24026/files - new: https://git.openjdk.org/jdk/pull/24026/files/366c8159..9ee9eecc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24026&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24026&range=02-03 Stats: 47 lines in 3 files changed: 0 ins; 0 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/24026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24026/head:pull/24026 PR: https://git.openjdk.org/jdk/pull/24026 From eastigeevich at openjdk.org Thu Mar 13 15:21:58 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 13 Mar 2025 15:21:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 22:05:13 GMT, Chad Rakoczy wrote: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. src/hotspot/share/code/nmethod.cpp line 1599: > 1597: return false; > 1598: } > 1599: We plant to relocate only compiled code. We can check this with: if (compiler_type() == compiler_none) { return false; } I think we don't need `is_method_handle_intrinsic()` and `is_continuation_native_intrinsic()` with this check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1993762130 From qamai at openjdk.org Thu Mar 13 15:23:55 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 13 Mar 2025 15:23:55 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> Message-ID: On Thu, 13 Mar 2025 12:31:14 GMT, Roberto Casta?eda Lozano wrote: >>> A store of a char value into a long[] array would be represented at the IR level as a conversion (ConvI2L) followed by a StoreL, no? >> >> No, a code such as this `MemorySegment.ofArray(longArray).set(ValueLayout.JAVA_SHORT, offset, c)` would produce a `StoreC` into a `long[]`. > >> > A store of a char value into a long[] array would be represented at the IR level as a conversion (ConvI2L) followed by a StoreL, no? >> >> No, a code such as this `MemorySegment.ofArray(longArray).set(ValueLayout.JAVA_SHORT, offset, c)` would produce a `StoreC` into a `long[]`. > > Right, in this case I interpret from the [comment at the declaration of `MemNode::memory_type()`](https://github.com/openjdk/jdk/blob/375722f4ab62865c45d8d76f01dc9c7209be57c8/src/hotspot/share/opto/memnode.hpp#L136) that the `memory_type()` of the StoreC node should be `T_SHORT` (the type of the value stored by the node), as opposed to the current `T_CHAR`. I propose to address this in a separate RFE. @robcasloz I disagree, I would expect the `memory_type` of a `StoreC` into a `long[]` to be something that means "a part of a `long[]`", which should be `T_LONG` if the store is guaranteed to be enclosed in a single `long`, or `T_VOID` otherwise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2721644366 From adinn at openjdk.org Thu Mar 13 15:35:58 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 13 Mar 2025 15:35:58 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v2] In-Reply-To: References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> <3F6Qa5TvjBGUXvCBukcqJHG4Q4UgoqU5NmP2uMOHQAM=.cd9894a4-1d51-4b4d-9648-d5855d50d97d@github.com> <7A298xkGx3GJ2Pt6yIEK_DABrosqrbpYXH905hVUyxc=.a7555ea3-ef86-4bd5-a569-c1757ac6f1ab@github.com> Message-ID: On Thu, 13 Mar 2025 12:34:25 GMT, Ferenc Rakoczi wrote: >>> @ferakocz I have modified your generator code to employ vector sequences and auxiliaries that handle iterative loads, stores and math/logic operations over vector sequences. It would be useful to have a review of the code from you and also for you to test it (see comments below re testing) The rewrite has allowed much of the generator logic to be condensed into calls to simple auxiliaries which provides a better mid-level view of how the code is structured. It has also clarified the register use. I think this will be a lot easier for maintainers to understand. A few further comments: >>> >>> 1. I have added some asserts to the montmul operations to ensure that input and output register sequences are either disjoint or overlapping. There may be further opportunities to add asserts in a follow-up. >>> 2. One thing I noted (commented on in code) after switching to passing vector sequences rather than relying on fixed mappings is that some reloading of q and qinv inside loops is unnecessary as the code in the loop does not write the relevant vectors. I left the code as is so that I could check that the generated code is identical to the original but I will move the relevant load outside the loop before pushing. >>> 3. I compared before and after dissasemblies of the generated code and it is unchanged modulo routine `dilithiumDecomposePoly`. For that intrinsic your generator code wrote successive, intermediate results into the next unused set of 4 vectors, which are in most cases used subsequently to hold a non-temporary result needed by a later computation. My code always writes intermediate results into the last set of 4 vectors (declared as `VSeq<4> vtmp(20)`). As a result my generated code has the same structure but a slightly different register mapping to yours. I don't believe this affects performance but the change do make it clearer how the computed values are being used. >>> 4. As well as comparing disassemblies for the generated code I verified the patch by running test `jdk/sun/security/provider/acvp/ML_DSA_Test.java`. However, I noted a problem with relying on the test as currently implemented since it did not appear to capture some errors in my code. I re-ran the test under the debugger and found that only one of the intrinsics was being exercised (dilithiumAlmostNtt). I confirmed this by adding -XX:+PrintCompilation to the test command line. It seems that all the calls to other intrinsic candidates occurred from the interpreter and did not run ofte... > >> @ferakocz I see that the test problem is being addressed as part of the x86 ML_DSA PR. > > Oh, so you have already found that :-) . @ferakocz Testing with the new `Launcher` run option found one mis-transcribed register set in the decomposePolyJava code - I passed register sequence 4-7 (vs2) when I should have passed sequence 8-11 (vs3). It is now fixed. I also pushed some improvements to the VSeq class and auxiliary methods. These should avoid some unnecessary copying of VSeq instances. They also ought also to deal with a rather unintuitive Windows/aarch64 error where the compiler claims an `int[8]` actual argument does not match an `int[N]` template argument (because it treats it as an `int*`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24026#issuecomment-2721679339 From chagedorn at openjdk.org Thu Mar 13 15:44:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Mar 2025 15:44:54 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 13:27:00 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review > What I propose here is that when the CastII becomes dead, any CFG paths that use the CastII node is made unreachable. Looks like this patch now does it not only for `CastII` nodes but for all `Type` nodes. It's an interesting idea but I'm not quite clear on the impact and how expensive the new recursive output search with a `Halt` node insertion is given that most of the time `Type` nodes will be dying normally together with their control path. It seems like this patch is only to fix some edge cases? But maybe the story is different with JDK-8275202. If it was only about `CastII` (or other `ConstraintCast`), could we also just insert the `Halt` node below the control node the `CastII` is pinned to instead (we now enforce that almost all `ConstraintCast` nodes, including `CastII`, are pinned)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2721702637 From tschatzl at openjdk.org Thu Mar 13 15:46:00 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 13 Mar 2025 15:46:00 GMT Subject: RFR: 8346470: Improve WriteBarrier JMH to have old-to-young refs In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 15:48:20 GMT, Eric Caspole wrote: > Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. > > Here is a run on a standard OCI A1.160 with JDK 25: > > Benchmark Mode Cnt Score Error Units > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.555 ? 0.116 ns/op > WriteBarrier.Witho... Some minor formatting issues... test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java line 127: > 125: @Setup(Level.Iteration) > 126: public void setupIteration() { > 127: // Reallocate each iteration to ensure they are in young gen Suggestion: // Reallocate target objects each iteration to ensure they are in young gen. test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java line 185: > 183: } > 184: } > 185: Suggestion: For consistent spacing between benchmarks. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24010#pullrequestreview-2682458764 PR Review Comment: https://git.openjdk.org/jdk/pull/24010#discussion_r1993806737 PR Review Comment: https://git.openjdk.org/jdk/pull/24010#discussion_r1993805710 From ecaspole at openjdk.org Thu Mar 13 15:53:43 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Thu, 13 Mar 2025 15:53:43 GMT Subject: RFR: 8346470: Improve WriteBarrier JMH to have old-to-young refs [v2] In-Reply-To: References: Message-ID: > Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. > > Here is a run on a standard OCI A1.160 with JDK 25: > > Benchmark Mode Cnt Score Error Units > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.555 ? 0.116 ns/op > WriteBarrier.Witho... Eric Caspole has updated the pull request incrementally with two additional commits since the last revision: - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24010/files - new: https://git.openjdk.org/jdk/pull/24010/files/7075365a..46da7dbc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24010&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24010&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24010.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24010/head:pull/24010 PR: https://git.openjdk.org/jdk/pull/24010 From ecaspole at openjdk.org Thu Mar 13 15:55:58 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Thu, 13 Mar 2025 15:55:58 GMT Subject: RFR: 8346470: Improve WriteBarrier JMH to have old-to-young refs [v2] In-Reply-To: References: Message-ID: <2J5tv5CdcNPngtTD7dTc0N_BU7hR_9oh4naM34JWIqg=.15d13edb-c2a6-4f7c-85ef-cffa5d5eef7c@github.com> On Thu, 13 Mar 2025 15:53:43 GMT, Eric Caspole wrote: >> Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. >> >> Here is a run on a standard OCI A1.160 with JDK 25: >> >> Benchmark Mode Cnt Score Error Units >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op >> WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op >> WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 1... > > Eric Caspole has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> Updated the code with Thomas' suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24010#issuecomment-2721751933 From tschatzl at openjdk.org Thu Mar 13 15:58:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 13 Mar 2025 15:58:58 GMT Subject: RFR: 8346470: Improve WriteBarrier JMH to have old-to-young refs [v2] In-Reply-To: References: Message-ID: <5WMDwKLS_c68QKjiBL16cjMUR6b_MQKJ28m36XqIdE8=.78ef226b-bf64-42c7-89be-2eb6a655c7f2@github.com> On Thu, 13 Mar 2025 15:53:43 GMT, Eric Caspole wrote: >> Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. >> >> Here is a run on a standard OCI A1.160 with JDK 25: >> >> Benchmark Mode Cnt Score Error Units >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op >> WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op >> WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 1... > > Eric Caspole has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24010#pullrequestreview-2682518985 From kvn at openjdk.org Thu Mar 13 16:29:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 13 Mar 2025 16:29:59 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Thu, 13 Mar 2025 06:40:27 GMT, Christian Hagedorn wrote: >> This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. >> >> ### Unified Logic to Clean Up Parse and Template Assertion Predicates >> We now use the following algorithm: >> https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 >> >> This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: >> >> https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 >> >> We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. >> >> #### Ignoring Useless Parse Predicates >> While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. >> >> #### Sharing Code for Parse and Template Assertion Predicates >> With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: >> >> - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. >> - Adding accessor methods to get the Predicate lists from `Compile`. >> - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates >> >> Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. >> >> ### Other Included Changes >> - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could ... > > src/hotspot/share/opto/cfgnode.hpp line 508: > >> 506: void mark_maybe_useful(); >> 507: bool is_useful() const; >> 508: void mark_useful(); > > Needed to move these definitions to the source file because I cannot include `predicates.hpp` here due to circular dependencies. I solved this by forward declaring `PredicateState` and moving the definitions to the source file. > > Same for these methods for `OpaqueTemplateAssertionPredicate` further down. Can you add `predicates.inline.hpp` for this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1993897088 From kvn at openjdk.org Thu Mar 13 16:34:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 13 Mar 2025 16:34:56 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 12:07:50 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24031#pullrequestreview-2682663046 From roland at openjdk.org Thu Mar 13 16:37:01 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 13 Mar 2025 16:37:01 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v14] In-Reply-To: References: Message-ID: > This change refactors `RShiftI`/`RshiftL` `Ideal`, `Identity` and > `Value` because the `int` and `long` versions are very similar and so > there's no logic duplication. In the process, support for some extra > transformations is added to `RShiftL`. I also added some new test > cases. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: - test with Long.Min/Long.Max + CONST64 - Merge branch 'master' into JDK-8349361 - review - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/mulnode.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/c2/irTests/RShiftLNodeIdealizationTests.java Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/mulnode.cpp Co-authored-by: Emanuel Peter - ... and 20 more: https://git.openjdk.org/jdk/compare/641bc5e4...a56e397b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23438/files - new: https://git.openjdk.org/jdk/pull/23438/files/1911ca6c..a56e397b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23438&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23438&range=12-13 Stats: 41691 lines in 579 files changed: 20383 ins; 13825 del; 7483 mod Patch: https://git.openjdk.org/jdk/pull/23438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23438/head:pull/23438 PR: https://git.openjdk.org/jdk/pull/23438 From roland at openjdk.org Thu Mar 13 16:46:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 13 Mar 2025 16:46:12 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v12] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 09:31:04 GMT, Christian Hagedorn wrote: > Nice refactoring! I have a few small comments - mostly code style. Otherwise, looks good to me, too. Thanks for the review. New commit should address all your comments. Now that the long min/max intrinsic is integrated, I also changed the long tests so they use long min/max and that triggered a bug in the code (missing `CONST64`) that I fixed. > src/hotspot/share/opto/mulnode.cpp line 1361: > >> 1359: if (in(1)->Opcode() == Op_LShift(bt) && >> 1360: in(1)->req() == 3 && >> 1361: in(1)->in(2) == in(2)) { > > Generally, is there notifaction code for this pattern to re-add the node to the IGVN worklist? If not, I don't think you need to handle it here if it's missing (it's just a missed opportunity but no correctness issue) but would be good to file a follow-up bug to handle it - especially when we want to add IGVN verification for `Ideal` and `Identity` with [JDK-8347273](https://bugs.openjdk.org/browse/JDK-8347273). Good catch. I fixed a case that was handled for int but not for long. There are others that are missing for both AFAICT. If I file a follow up bug, writing a test case for it is going to be very hard. So testing a fix is also hard. Shouldn't we wait for JDK-8347273 and fix whatever follows up of that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23438#issuecomment-2721945944 PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1993924810 From sparasa at openjdk.org Thu Mar 13 16:49:18 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 13 Mar 2025 16:49:18 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v13] In-Reply-To: References: Message-ID: > The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. > > For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge branch 'master' of https://git.openjdk.java.net/jdk into ndd_codegen_jdk - Cleanup BoxLockNode and XXX comments - undo changes to testing implementation - undo BoxLock node fix - restore eorl for RIR - Remove randomly generated test_reg2 for dst= rax test - Update copyright; remove extra lines - remove unused expand blocks;ndd version of orL_rReg_castP2X - ndd version of cmov eq/ne - remove APX support when bmi2 support is absent - ... and 5 more: https://git.openjdk.org/jdk/compare/c29a8d91...a12598b1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/51d0e0d4..a12598b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=11-12 Stats: 104542 lines in 2595 files changed: 53525 ins; 33952 del; 17065 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From roland at openjdk.org Thu Mar 13 17:05:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 13 Mar 2025 17:05:02 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 13:27:00 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for having a look. > Looks like this patch now does it not only for `CastII` nodes but for all `Type` nodes. It's an interesting idea but I'm not quite clear on the impact and how expensive the new recursive output search with a `Halt` node insertion is given that most of the time `Type` nodes will be dying normally together with their control path. It seems like this patch is only to fix some edge cases? But maybe the story is different with JDK-8275202. The goal of the patch is to free us from having to worry about `Type` nodes becoming dead on a cfg path that doesn't fold ever again. With this patch, there's no need for assert predicates AFAIU. I'm not suggesting we throw them away, this said. But how many hours have we spent working on them? This same problem is getting in the way of 8275202. Sure, it's for corner cases but they are corner cases that have no solution other than fragile workarounds or new complicated construct that will leak in a lot of parts of the compiler (the way code handling assert predicates are all over the place now). Cost in terms of compilation time could be evaluated the usual way. I haven't done any of that but I could. > If it was only about `CastII` (or other `ConstraintCast`), could we also just insert the `Halt` node below the control node the `CastII` is pinned to instead (we now enforce that almost all `ConstraintCast` nodes, including `CastII`, are pinned)? That's actually the first thing I tried but that doesn't work. A `Cast` can sometimes be dependent on some condition that was hoisted. What makes the `Cast` top is both some condition the `Cast` is control dependent on and the type of its input which can itself be control dependent on some control flow that's dominated by the control of the `Cast`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2722012244 From kxu at openjdk.org Thu Mar 13 17:18:57 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 13 Mar 2025 17:18:57 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:03:10 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> add micro benchmark > > src/hotspot/share/opto/addnode.cpp line 407: > >> 405: } >> 406: >> 407: // Try to convert a serial of additions into a single multiplication. Also convert `(a * CON) + a` to `(CON + 1) * a` as > > What about `(a * CON1) + (a * CON2)`? Like `11 * a + 5 * a`. Do we also optimize that? `AddNode::IdealIL` handles to more general associative patterns like `(a*b) + (a*c)` into `a*(b + c)` > src/hotspot/share/opto/addnode.cpp line 413: > >> 411: // power-of-2 addition (e.g., 3 * a => (a << 2) + a). Without this check, GVN would keep trying to optimize the same >> 412: // node and can't progress. For example, 3 * a => (a << 2) + a => 3 * a => (a << 2) + a => ... >> 413: if (find_power_of_two_addition_pattern(this, bt, nullptr) != nullptr) { > > Where does the optimization `3 * a => (a << 2) + a` happen? Do we use `find_power_of_two_addition_pattern` there too? If not: how do we prevent the code of the two locations from diverging in the future? This `3 * a => (a << 2) + a` happens in `MulINode::Ideal()` and is independent from my code. My code does not explicitly produce power of two patterns but simple multiplication nodes, but such a multiplication node will be later idealized into power of two's. Power of two patterns make my code more complex, but this is out of my control. > how do we prevent the code of the two locations from diverging in the future? I don't expect primitive opts like power-of-two being removed. Even if it does, `find_simple_multiplication_pattern()` will still work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1993987750 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1993984743 From kxu at openjdk.org Thu Mar 13 17:23:58 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 13 Mar 2025 17:23:58 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:06:10 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> add micro benchmark > > src/hotspot/share/opto/addnode.cpp line 480: > >> 478: if (!con->is_Con()) { >> 479: swap(con, base); >> 480: } > > Is that necessary? Does `Mul` not automatically get canonicalized so that the constant is on the rhs? This is not related to `Mul` canonicalization. Swapping those two variables makes my next 4 lines syntactically easier to write. (So I don't have to do `(con->is_Con ? con : base)->is_top()` and so on.) This shouldn't be any more costly with a modern C++ compiler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1993995329 From sparasa at openjdk.org Thu Mar 13 17:50:20 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 13 Mar 2025 17:50:20 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: > The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. > > For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add comments for encoding and UCF ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/a12598b1..e9369a40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=12-13 Stats: 3 lines in 2 files changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From sparasa at openjdk.org Thu Mar 13 17:56:00 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 13 Mar 2025 17:56:00 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v12] In-Reply-To: References: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> Message-ID: On Thu, 13 Mar 2025 07:10:56 GMT, Emanuel Peter wrote: > I had a quick look over this. It's a bit hard to review for me, because it is basically about specific APX instructions. We probably have to heavily rely on testing. But APX hardware is not yet available, right? > > How can be best test this? Is there any way to emulate, maybe using SDE? What testing did you run for this? The code was tested using the SDE emulator. Apart from small java-based unit tests to check if the correct instruction is being emitted, the APX enabling was also tested on SPECjvm2008 workloads as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2722243353 From jbhateja at openjdk.org Thu Mar 13 18:08:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 13 Mar 2025 18:08:09 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 17:50:20 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add comments for encoding and UCF LGTM, Please file a JBS on future modification in assembler layer for EEVEX to REX/REX2 encoding and append to this PR before committing. Thanks. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23501#pullrequestreview-2682967777 From galder at openjdk.org Thu Mar 13 18:15:09 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 13 Mar 2025 18:15:09 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Thu, 13 Mar 2025 06:48:26 GMT, Christian Hagedorn wrote: >> This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. >> >> ### Unified Logic to Clean Up Parse and Template Assertion Predicates >> We now use the following algorithm: >> https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 >> >> This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: >> >> https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 >> >> We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. >> >> #### Ignoring Useless Parse Predicates >> While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. >> >> #### Sharing Code for Parse and Template Assertion Predicates >> With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: >> >> - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. >> - Adding accessor methods to get the Predicate lists from `Compile`. >> - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates >> >> Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. >> >> ### Other Included Changes >> - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could ... > > src/hotspot/share/opto/predicates.hpp line 331: > >> 329: // the ParsePredicateNode is not marked useless. >> 330: bool is_valid() const { >> 331: return _parse_predicate_node != nullptr && !_parse_predicate_node->is_useless(); > > Avoids visiting useless Parse Predicates during Predicate iteration. When can `_parse_predicate_node` be null? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1994083460 From galder at openjdk.org Thu Mar 13 18:21:53 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 13 Mar 2025 18:21:53 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: <7WxD6leEC6z_D-jVWI42eWABqEgZynDtWHEAT5cQg8s=.29b27503-378c-484e-a786-6e91727dbb05@github.com> On Thu, 13 Mar 2025 12:07:50 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > Thanks, > Christian src/hotspot/share/compiler/compileBroker.cpp line 2377: > 2375: CompilationLog::log()->log_failure(thread, task, failure_reason, retry_message); > 2376: } > 2377: if (PrintCompilation || task->directive()->PrintCompilationOption) { Sounds like a good idea, but is this the only place where we want to do something when either `-XX:+PrintCompilation` or `-XX:CompileCommand=printcompilation,*::*` is set? IOW, shouldn't other checks for `PrintCompilation` also take `-XX:CompileCommand=printcompilation,*::*` into account? E.g. line 2182 above does: if (directive->PrintCompilationOption) { ResourceMark rm; task->print_tty(); } Shouldn't that be: if (PrintCompilation || directive->PrintCompilationOption) { ... } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24031#discussion_r1994087988 From ecaspole at openjdk.org Thu Mar 13 18:34:57 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Thu, 13 Mar 2025 18:34:57 GMT Subject: Integrated: 8346470: Improve WriteBarrier JMH to have old-to-young refs In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 15:48:20 GMT, Eric Caspole wrote: > Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. > > Here is a run on a standard OCI A1.160 with JDK 25: > > Benchmark Mode Cnt Score Error Units > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.555 ? 0.116 ns/op > WriteBarrier.Witho... This pull request has now been integrated. Changeset: 03ef79cf Author: Eric Caspole URL: https://git.openjdk.org/jdk/commit/03ef79cf05bdcfc3bb126d004f8f039fb2f4ba9f Stats: 93 lines in 1 file changed: 91 ins; 0 del; 2 mod 8346470: Improve WriteBarrier JMH to have old-to-young refs Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/24010 From adinn at openjdk.org Thu Mar 13 19:00:55 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 13 Mar 2025 19:00:55 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v4] In-Reply-To: References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> Message-ID: <8dMVU194H31mh_hukgZonYn8upYHd6qQCI404IM9AVU=.8e4e8c24-e557-4ade-9118-15ec757432b2@github.com> On Thu, 13 Mar 2025 15:17:29 GMT, Andrew Dinn wrote: >> This PR reworks the existing AArch64 ML_DSA intrinsic code generator to make it clearer to read and easier to maintain. > > Andrew Dinn has updated the pull request incrementally with two additional commits since the last revision: > > - use references and const to avoid VSeq copying and fix int array arg issue > - fix comment GC test failures appear to be unrelated (especially seeing the same failure onaarch64 and x86 for an aarch64-only change!). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24026#issuecomment-2722401713 From eastigeevich at openjdk.org Thu Mar 13 21:46:58 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 13 Mar 2025 21:46:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 22:05:13 GMT, Chad Rakoczy wrote: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. test/hotspot/jtreg/compiler/whitebox/RelocateAllNMethods.java line 98: > 96: import jdk.test.whitebox.WhiteBox; > 97: > 98: public class RelocateAllNMethods { Relocate all without destination looks confusing. You need to specify where to relocate nmethods. Also there would be not many compiled code. It mostly tests relocation of code which cannot be relocated. IMO it's better to compile a bunch of method, to relocate them, and to execute them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1994342904 From sparasa at openjdk.org Thu Mar 13 22:55:55 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 13 Mar 2025 22:55:55 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 18:04:55 GMT, Jatin Bhateja wrote: > LGTM, > > Please file a JBS on future modification in assembler layer for EEVEX to REX/REX2 encoding and append to this PR before committing. > > Thanks. Thanks for the review Jatin! The JBS for EEVEX to REX/REX2 demotion has been created: https://bugs.openjdk.org/browse/JDK-8351994 Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2722854725 From bulasevich at openjdk.org Fri Mar 14 00:32:54 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 14 Mar 2025 00:32:54 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 23:01:10 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with three additional commits since the last revision: > > - Add PRODUCT check > - Remove isRelocation flag > - Replace copy constructor with clone function src/hotspot/share/code/nmethod.cpp line 1404: > 1402: memcpy(nm_copy, this, size()); > 1403: > 1404: // Allocate memory and copy immutable data from C heap A new immutable data block is allocated for the new nmethod. Would it be possible to reuse the old one instead? This could help reduce memory allocation overhead, though it would make the logic more complicated. The destroyed nmethod should consider whether it is the last one holding the immutable nmethod blob. Probably, concurrent modification issues could arise. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1994476606 From duke at openjdk.org Fri Mar 14 01:38:51 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 14 Mar 2025 01:38:51 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v2] In-Reply-To: <6DQFvCIhBHpeuVPw9ZZthZzJrwslPnyanFAvH8TdZQQ=.51d9e0da-f1aa-44af-8963-921c6f641b8f@github.com> References: <6DQFvCIhBHpeuVPw9ZZthZzJrwslPnyanFAvH8TdZQQ=.51d9e0da-f1aa-44af-8963-921c6f641b8f@github.com> Message-ID: On Thu, 13 Mar 2025 09:17:50 GMT, Roberto Casta?eda Lozano wrote: > Hi @kuaiwei, thanks for working on this feature! If this pull request is work in progress as noted in the description, may I suggest switching it back to "Draft" mode until you consider it ready for review? @robcasloz Thanks for your suggestion. This feature is close for review. I may request review soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2723107759 From duke at openjdk.org Fri Mar 14 02:30:20 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 14 Mar 2025 02:30:20 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= [v5] In-Reply-To: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: > Add RISCV zfa extension fminm/fmaxm > This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge branch 'openjdk:master' into Zfa_dev_branch - delete useless comment - add temp commit for test - Merge branch 'openjdk:master' into Zfa_dev_branch - 8349632: RISC-V: Add Zfa fminm/fmaxm Change macro-assembler routine to directly call in riscv.ad - JDK-8349632: RISC-V: Add Zfa fminm/fmaxm add zfa predicate - 8349632: RISC-V: Add Zfa fminm/fmaxm delete assert in new add macroAssembly but not the old - JDK-8349632: RISCV: Add Zfa fminm/fmaxm delete assert and change fminm/fmaxm to new match rule - 8349632:RISC-V: Add Zfa fminm/fmaxm ------------- Changes: https://git.openjdk.org/jdk/pull/23509/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23509&range=04 Stats: 80 lines in 2 files changed: 80 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23509/head:pull/23509 PR: https://git.openjdk.org/jdk/pull/23509 From duke at openjdk.org Fri Mar 14 05:51:04 2025 From: duke at openjdk.org (duke) Date: Fri, 14 Mar 2025 05:51:04 GMT Subject: RFR: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= [v5] In-Reply-To: References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: On Fri, 14 Mar 2025 02:30:20 GMT, Anjian-Wen wrote: >> Add RISCV zfa extension fminm/fmaxm >> This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max > > Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'openjdk:master' into Zfa_dev_branch > - delete useless comment > - add temp commit for test > - Merge branch 'openjdk:master' into Zfa_dev_branch > - 8349632: RISC-V: Add Zfa fminm/fmaxm > > Change macro-assembler routine to directly call in riscv.ad > - JDK-8349632: RISC-V: Add Zfa fminm/fmaxm > > add zfa predicate > - 8349632: RISC-V: Add Zfa fminm/fmaxm > > delete assert in new add macroAssembly but not the old > - JDK-8349632: RISCV: Add Zfa fminm/fmaxm > > delete assert and change fminm/fmaxm to new match rule > - 8349632:RISC-V: Add Zfa fminm/fmaxm @Anjian-Wen Your change (at version c65798abf0de263f0ef13f4a6866dac43b0f8c4a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23509#issuecomment-2723670007 From duke at openjdk.org Fri Mar 14 05:56:03 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 14 Mar 2025 05:56:03 GMT Subject: Integrated: 8349632: RISC-V: Add =?UTF-8?B?WmZhwqBmbWlubS9mbWF4bQ==?= In-Reply-To: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> References: <1MJ5w-srKmQHUSWuKLHdF3-08L3imqBhpSgTmH3-fHI=.dae8666e-5370-49e1-90b5-45c95156f0dc@github.com> Message-ID: On Fri, 7 Feb 2025 06:52:13 GMT, Anjian-Wen wrote: > Add RISCV zfa extension fminm/fmaxm > This two new Floating-point instructions can deal with NaN input directly, which can decrease instructions when calculate min or max This pull request has now been integrated. Changeset: a7a09f69 Author: Anjian-Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/a7a09f69abc6c4730599d3de9067c2fde75c5172 Stats: 80 lines in 2 files changed: 80 ins; 0 del; 0 mod 8349632: RISC-V: Add Zfa?fminm/fmaxm Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/23509 From epeter at openjdk.org Fri Mar 14 07:16:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Mar 2025 07:16:57 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v12] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 16:42:59 GMT, Roland Westrelin wrote: >> Nice refactoring! I have a few small comments - mostly code style. Otherwise, looks good to me, too. > >> Nice refactoring! I have a few small comments - mostly code style. Otherwise, looks good to me, too. > > Thanks for the review. > New commit should address all your comments. > Now that the long min/max intrinsic is integrated, I also changed the long tests so they use long min/max and that triggered a bug in the code (missing `CONST64`) that I fixed. @rwestrel I saw this in testing from yesterday: `linux-aarch64-debug` with `-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders`. Failed IR Rules (2) of Methods (2) ---------------------------------- 1) Method "public long compiler.c2.irTests.RShiftLNodeIdealizationTests.test8(long)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#RSHIFT_L#_", "1", "_#LSHIFT_L#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(RShiftL.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(LShiftL.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 2) Method "public long compiler.c2.irTests.RShiftLNodeIdealizationTests.test9(long)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#RSHIFT_L#_", "1", "_#LSHIFT_L#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(RShiftL.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(LShiftL.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23438#issuecomment-2723833440 From epeter at openjdk.org Fri Mar 14 07:17:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Mar 2025 07:17:00 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v14] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 16:37:01 GMT, Roland Westrelin wrote: >> This change refactors `RShiftI`/`RshiftL` `Ideal`, `Identity` and >> `Value` because the `int` and `long` versions are very similar and so >> there's no logic duplication. In the process, support for some extra >> transformations is added to `RShiftL`. I also added some new test >> cases. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: > > - test with Long.Min/Long.Max + CONST64 > - Merge branch 'master' into JDK-8349361 > - review > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/RShiftLNodeIdealizationTests.java > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Emanuel Peter > - ... and 20 more: https://git.openjdk.org/jdk/compare/4116d9cc...a56e397b Not sure if that still reproduces after your changes. LMK when I should run testing again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23438#issuecomment-2723834769 From fyang at openjdk.org Fri Mar 14 07:53:01 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Mar 2025 07:53:01 GMT Subject: RFR: 8352011: RISC-V: Two IR tests fail after JDK-8351662 Message-ID: Hi, please review this small change fixing two IR tests. These two tests which are enabled for riscv64 by JDK-8351662. But they fail on linux-riscv64 platforms where there is no support for RVV. They are expecting vector operations thus requires RVV support on this platform. Testing: Same tests will be skipped on platforms without RVV after this change. Tagging @Hamlin-Li ------------- Commit messages: - 8352011: RISC-V: Two IR tests fail after JDK-8351662 Changes: https://git.openjdk.org/jdk/pull/24048/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24048&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352011 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24048.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24048/head:pull/24048 PR: https://git.openjdk.org/jdk/pull/24048 From chagedorn at openjdk.org Fri Mar 14 07:55:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 07:55:00 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v14] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 16:37:01 GMT, Roland Westrelin wrote: >> This change refactors `RShiftI`/`RshiftL` `Ideal`, `Identity` and >> `Value` because the `int` and `long` versions are very similar and so >> there's no logic duplication. In the process, support for some extra >> transformations is added to `RShiftL`. I also added some new test >> cases. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: > > - test with Long.Min/Long.Max + CONST64 > - Merge branch 'master' into JDK-8349361 > - review > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/RShiftLNodeIdealizationTests.java > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Emanuel Peter > - ... and 20 more: https://git.openjdk.org/jdk/compare/b4bf3a00...a56e397b Apart from Emanuel's report, it looks good to me, thanks for the update! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23438#pullrequestreview-2684620531 From chagedorn at openjdk.org Fri Mar 14 07:55:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 07:55:01 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v12] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 16:41:23 GMT, Roland Westrelin wrote: > If I file a follow up bug, writing a test case for it is going to be very hard Yes, it will be. There is currently no way to really verify that reliably without the additional verification. I think it's okay to wait for JDK-8347273. Maybe you can add a note there or file a separate issue to keep track of the missing bits detected in this investigation.. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23438#discussion_r1995038216 From syan at openjdk.org Fri Mar 14 08:04:51 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 14 Mar 2025 08:04:51 GMT Subject: RFR: 8352011: RISC-V: Two IR tests fail after JDK-8351662 In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 07:46:40 GMT, Fei Yang wrote: > Hi, please review this small change fixing two IR tests. > > These two tests which are enabled for riscv64 by JDK-8351662. But they fail on linux-riscv64 platforms where there is no support for RVV. They are expecting vector operations thus requires RVV support on this platform. > > Testing: Same tests will be skipped on platforms without RVV after this change. Tagging @Hamlin-Li LGTM ------------- Marked as reviewed by syan (Committer). PR Review: https://git.openjdk.org/jdk/pull/24048#pullrequestreview-2684650192 From mli at openjdk.org Fri Mar 14 09:11:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Mar 2025 09:11:51 GMT Subject: RFR: 8352011: RISC-V: Two IR tests fail after JDK-8351662 In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 07:46:40 GMT, Fei Yang wrote: > Hi, please review this small change fixing two IR tests. > > These two tests which are enabled for riscv64 by JDK-8351662. But they fail on linux-riscv64 platforms where there is no support for RVV. They are expecting vector operations thus requires RVV support on this platform. > > Testing: Same tests will be skipped on platforms without RVV after this change. Tagging @Hamlin-Li Looks good. Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24048#pullrequestreview-2684871756 From xgong at openjdk.org Fri Mar 14 09:34:47 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 14 Mar 2025 09:34:47 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v3] In-Reply-To: References: Message-ID: > Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures. > > The performance of Vector API relative JMH micro benchmarks can improve about 70x ~ 95x on a NVIDIA Grace CPU, which is a 128-bit vector length sve2 architecture, with different UseSVE options. Here is the gain details: > > > Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 > ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x > ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x > ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x > ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x > ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x > ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x > ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x > ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x > ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x > ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x > ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x > ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x > IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x > IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x > IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x > IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x > IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x > IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x > IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x > IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x > IntMaxVector.UMAX 1024 thrpt 30 82.40x 81.85x 82.242x > IntMaxVector.UMAXMasked 1024 thrpt 30 90.30x 92.10x 92.587x > IntMaxVector.UMIN 1024 thrpt 30 82.84x 81.43x 82.801x > IntMaxVector.UMINMasked 1024 thrpt 30 90.43x 91.49x 92.678x > LongMaxVector.SADD 1024 thrpt 30 82.01x 81.74x 82.153x > LongMaxVector... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Fix IR test failure on X64 with UseAVX=1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23608/files - new: https://git.openjdk.org/jdk/pull/23608/files/9aa97fb0..30bbbde5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23608&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23608&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23608.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23608/head:pull/23608 PR: https://git.openjdk.org/jdk/pull/23608 From xgong at openjdk.org Fri Mar 14 09:34:50 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 14 Mar 2025 09:34:50 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> References: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> Message-ID: On Thu, 13 Mar 2025 09:32:20 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'jdk:master' into JDK_8349522 >> - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations >> >> Since PR [1] has added several new vector operations in VectorAPI >> and the X86 backend implementation for them, this patch adds the >> AArch64 backend part for NEON/SVE architectures. >> >> The performance of Vector API relative jmh micro benchmarks can >> improve about 70x ~ 95x on an AArch64 128-bit vector length sve2 >> architecture with different UseSVE options. Here is the uplift >> details: >> >> ``` >> Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 >> ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x >> ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x >> ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x >> ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x >> ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x >> ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x >> ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x >> ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x >> ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x >> ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x >> ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x >> ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x >> IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x >> IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x >> IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x >> IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x >> IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x >> IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x >> IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x >> IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x ... > > I'm getting this failure with `-XX:UseAVX=1` on x64. It is a new test you added. > > Failed IR Rules (1) of Methods (1) > ---------------------------------- > 1) Method "public void compiler.vectorapi.VectorSaturatedOperationsTest.susub_masked()" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={BEFORE_MATCHING}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx", "true", "asimd", "true"}, counts={"_#V#SATURATING_SUB_VL#_", " >0 ", "unsigned_vector_node", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "Before matching": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(SaturatingSubV.*)+(\\s){2}===.*vector[A-Za-z])" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! > * Constraint 2: "unsigned_vector_node" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! Hi @eme64 , the IR test failure is fixed. Would you mind re-running the test again? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2724149673 From xgong at openjdk.org Fri Mar 14 09:39:17 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 14 Mar 2025 09:39:17 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v2] In-Reply-To: References: Message-ID: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Add the IR test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23790/files - new: https://git.openjdk.org/jdk/pull/23790/files/8934fae6..c0ebfa43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23790&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23790&range=00-01 Stats: 252 lines in 2 files changed: 252 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23790/head:pull/23790 PR: https://git.openjdk.org/jdk/pull/23790 From chagedorn at openjdk.org Fri Mar 14 09:40:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 09:40:55 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: <7WxD6leEC6z_D-jVWI42eWABqEgZynDtWHEAT5cQg8s=.29b27503-378c-484e-a786-6e91727dbb05@github.com> References: <7WxD6leEC6z_D-jVWI42eWABqEgZynDtWHEAT5cQg8s=.29b27503-378c-484e-a786-6e91727dbb05@github.com> Message-ID: <1kOLqjY6QeYcrHB-OjFKDy9gtmTd80E5jT0XKL416a0=.477c43c7-a69a-4160-85d3-8457c64f4c33@github.com> On Thu, 13 Mar 2025 18:15:39 GMT, Galder Zamarre?o wrote: >> We currently only print a compilation bailout with `-XX:+PrintCompilation`: >> >> 7782 90 b 4 Test::main (19 bytes) >> 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout >> >> But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. >> >> Thanks, >> Christian > > src/hotspot/share/compiler/compileBroker.cpp line 2377: > >> 2375: CompilationLog::log()->log_failure(thread, task, failure_reason, retry_message); >> 2376: } >> 2377: if (PrintCompilation || task->directive()->PrintCompilationOption) { > > Sounds like a good idea, but is this the only place where we want to do something when either `-XX:+PrintCompilation` or `-XX:CompileCommand=printcompilation,*::*` is set? IOW, shouldn't other checks for `PrintCompilation` also take `-XX:CompileCommand=printcompilation,*::*` into account? > > E.g. line 2182 above does: > > > if (directive->PrintCompilationOption) { > ResourceMark rm; > task->print_tty(); > } > > > Shouldn't that be: > > > if (PrintCompilation || directive->PrintCompilationOption) { > ... > } I totally agree that there are other places where we have this mismatch - there are actually quite a few. This might require some more effort. I suggest to move forward with this small patch and file a separate RFE for revisiting other uses. What you do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24031#discussion_r1995237381 From chagedorn at openjdk.org Fri Mar 14 09:40:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 09:40:56 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: <1kOLqjY6QeYcrHB-OjFKDy9gtmTd80E5jT0XKL416a0=.477c43c7-a69a-4160-85d3-8457c64f4c33@github.com> References: <7WxD6leEC6z_D-jVWI42eWABqEgZynDtWHEAT5cQg8s=.29b27503-378c-484e-a786-6e91727dbb05@github.com> <1kOLqjY6QeYcrHB-OjFKDy9gtmTd80E5jT0XKL416a0=.477c43c7-a69a-4160-85d3-8457c64f4c33@github.com> Message-ID: On Fri, 14 Mar 2025 09:36:41 GMT, Christian Hagedorn wrote: >> src/hotspot/share/compiler/compileBroker.cpp line 2377: >> >>> 2375: CompilationLog::log()->log_failure(thread, task, failure_reason, retry_message); >>> 2376: } >>> 2377: if (PrintCompilation || task->directive()->PrintCompilationOption) { >> >> Sounds like a good idea, but is this the only place where we want to do something when either `-XX:+PrintCompilation` or `-XX:CompileCommand=printcompilation,*::*` is set? IOW, shouldn't other checks for `PrintCompilation` also take `-XX:CompileCommand=printcompilation,*::*` into account? >> >> E.g. line 2182 above does: >> >> >> if (directive->PrintCompilationOption) { >> ResourceMark rm; >> task->print_tty(); >> } >> >> >> Shouldn't that be: >> >> >> if (PrintCompilation || directive->PrintCompilationOption) { >> ... >> } > > I totally agree that there are other places where we have this mismatch - there are actually quite a few. This might require some more effort. I suggest to move forward with this small patch and file a separate RFE for revisiting other uses. What you do you think? I guess there are also flags where we have this mismatch between global flag and compile command local flag. Maybe we should do a general pass over all the compile commands and check how we can align them better with their global flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24031#discussion_r1995238758 From rcastanedalo at openjdk.org Fri Mar 14 09:46:08 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 14 Mar 2025 09:46:08 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> Message-ID: On Thu, 13 Mar 2025 17:58:26 GMT, Galder Zamarre?o wrote: > > ... explore creating a dedicated `StoreS` node in a separate RFE. > > Why not do this in this PR? Seems like the right approach to me. My thinking is that this is a bug whose fix we might want to backport to several JDK Update releases. The fix proposed in this PR is minimal and local to the array fill optimization, whereas the alternative approach of defining a `StoreS` node (see prototype [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-8351468-array-fill-optimization-new-store-node)) 1) is more costly to apply due to its larger changeset, and 2) incurs a significantly higher risk of introducing regressions, as it affects the entire C2 compilation chain (for example, I found while prototyping it that it affects the output of the store merging optimization). See [the OpenJDK Developers' Guide](https://openjdk.org/guide/#backporting) for a more elaborate discussion of the trade-offs involved in backporting. Having said this, I still think we should consider introducing a `StoreS` node in a follow-up RFE, and perhaps also enforcing consistent type abbreviations across load and store node names, e.g. renaming `LoadUSNode` to `LoadCNode`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2724174002 From xgong at openjdk.org Fri Mar 14 09:48:11 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 14 Mar 2025 09:48:11 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value Message-ID: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> The following assertion fails on AArch64: Internal Error (jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2991), pid=3822987, tid=3823007 assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value with a simple Vector API case: public static IntVector test() { IntVector iv = IntVector.zero(IntVector.SPECIES_128); return iv.lanewise(VectorOperators.ROR, iv); } On AArch64, vector `ROR/ROL` (rotate right/left) operations are implemented with a combination of shifts. Please see the pattern for `ROR`: lsr dst1, src, cnt // unsigned right shift lsl dst2, src, bitSize - cnt // left shift orr dst, dst1, dst2 // logical or where `bitSize` is the element type width (e.g. `32` for `int`). In above case, `cnt` is a zero constant, resulting in a left shift of 32 (`bitSize - 0`), which exceeds the instruction's valid shift count range and triggers the assertion. To fix this, we need to mask the shift count to ensure it stays within valid range when calculating shift counts for rotate operations: `shiftCnt = shiftCnt & (bitSize - 1)`. Note that the mask is only necessary for constant shift counts. This not only fixes the assertion failure, but also allows `ROR/ROL src, 0` to be optimized to `src` directly. For vector variables as shift counts, the masking can be safely omitted because: 1. Vector shift instructions that take a vector register as the shift count may not automatically apply modulo arithmetic based on register size. When the shift count is `32` for int type, the result may be either `zeros` or `src`. However, this doesn't affect correctness for rotate since the final result is combined with `src` using a logical `OR` operation. 2. It saves a vector logical `AND` for masking, which is friendly to the performance. ------------- Commit messages: - 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value Changes: https://git.openjdk.org/jdk/pull/24051/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24051&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351627 Stats: 306 lines in 2 files changed: 305 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24051/head:pull/24051 PR: https://git.openjdk.org/jdk/pull/24051 From roland at openjdk.org Fri Mar 14 09:52:00 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 14 Mar 2025 09:52:00 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v14] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 07:13:59 GMT, Emanuel Peter wrote: > Not sure if that still reproduces after your changes. LMK when I should run testing again. I could reproduce that one by forcing one of the random values to a particular constant. It's a bug in the test that was fixed since. @eme64 could you run testing again, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23438#issuecomment-2724184341 From rcastanedalo at openjdk.org Fri Mar 14 09:57:51 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 14 Mar 2025 09:57:51 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> Message-ID: On Thu, 13 Mar 2025 15:21:17 GMT, Quan Anh Mai wrote: > I would expect the `memory_type` of a `StoreC` into a `long[]` to be something that means "a part of a `long[]`" If that was the intended meaning of `MemNode::memory_type()`, wouldn't the function be redundant, because we can retrieve that information from `MemNode::adr_type()` already? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2724199373 From roland at openjdk.org Fri Mar 14 09:59:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 14 Mar 2025 09:59:59 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v3] In-Reply-To: References: Message-ID: > The `arraycopy` writes to a non escaping array so its `ArrayCopy` node > is marked as having a narrow memory effect. One of the loads from the > destination after the copy is transformed into a load from the source > array (the rationale being that if there's no load from the > destination of the copy, the `arraycopy` is not needed). The load from > the source has the input memory state of the `ArrayCopy` as memory > input. That load is then sunk out of the loop and its control is > updated to be after the `ArrayCopy`. That's legal because the > `ArrayCopy` only has a narrow memory effect and can't modify the > source. The `ArrayCopy` can't be eliminated and is expanded. In the > process, a `MemBar` that has a wide memory effect is added. The load > from the source has control after the membar but memory state before > and because the membar has a wide memory effect, the load is anti > dependent on the membar: the graph is broken (the load can't be pinned > after the membar and anti dependent on it). > > In short, the problem is that the graph is transformed under the > assumption that the `ArrayCopy` has a narrow effect but the > `ArrayCopy` is expanded to a subgraph that has a wide memory > effect. The fix I propose is to not insert a membar with a wide memory > effect. We still need a membar when the destination is non escaping > because the expanded `ArrayCopy`, if it writes to a tighly allocated > array, writes to raw memory and not to the destination memory slice. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - more - Merge branch 'master' into JDK-8341976 - more - exp - fix - Merge branch 'master' into HEAD - review - whitespace - fix & test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23465/files - new: https://git.openjdk.org/jdk/pull/23465/files/06ee02de..aa7b4478 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=01-02 Stats: 109608 lines in 2712 files changed: 55077 ins; 36266 del; 18265 mod Patch: https://git.openjdk.org/jdk/pull/23465.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23465/head:pull/23465 PR: https://git.openjdk.org/jdk/pull/23465 From chagedorn at openjdk.org Fri Mar 14 10:02:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 10:02:54 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Thu, 13 Mar 2025 16:27:05 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/opto/cfgnode.hpp line 508: >> >>> 506: void mark_maybe_useful(); >>> 507: bool is_useful() const; >>> 508: void mark_useful(); >> >> Needed to move these definitions to the source file because I cannot include `predicates.hpp` here due to circular dependencies. I solved this by forward declaring `PredicateState` and moving the definitions to the source file. >> >> Same for these methods for `OpaqueTemplateAssertionPredicate` further down. > > Can you add `predicates.inline.hpp` for this? Do you mean the enum definition for `PredicateState`? That could work, so I can just include that header when I need to use the enum. Should I then rather name the `hpp` file something like `predicates_enums.hpp`? IIUC, `xyz.inline.hpp` should be used primarily for inline methods. Having the separate enum header also allows us to put this one there which makes things cleaner: https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/predicates.hpp#L203-L211 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1995268997 From xgong at openjdk.org Fri Mar 14 10:04:50 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 14 Mar 2025 10:04:50 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v3] In-Reply-To: References: Message-ID: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'jdk:master' into JDK-8350463 - Add the IR test - 8350463: AArch64: Add vector rearrange support for small lane count vectors The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. This patch added the rearrange support for vector types with small lane count. Here are the main changes: - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. 1) NEON JMH on panama-vector:vectorIntrinsics: ``` Benchmark (size) Mode Cnt Units Before After Gain Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x ``` JMH on jdk mainline: ``` Benchmark (SIZE) Mode Cnt Units Before After Gain SelectFromBenchmark.rearrangeFromDoubleVector 1024 thrpt 30 ops/ms 44.593 1319.977 29.63x SelectFromBenchmark.rearrangeFromDoubleVector 2048 thrpt 30 ops/ms 22.318 660.061 29.58x SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 30 ops/ms 45.823 1458.144 31.82x SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 30 ops/ms 23.050 729.881 31.67x VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 30 ops/ms 97.210 1082.884 11.14x VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 30 ops/ms 48.642 541.341 11.13x VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 30 ops/ms 24.285 270.419 11.14x VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 30 ops/ms 12.421 135.115 10.88x ``` 2) SVE JMH on panama-vector:vectorIntrinsics: ``` Benchmark (size) Mode Cnt Units Before After Gain Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.396 577.744 7.37x Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.119 2538.261 35.19x Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.992 2536.972 34.75x Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.400 561.934 7.26x Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.858 2949.076 41.61x Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 70.654 2954.273 41.81x Int64Vector.rearrange 1024 thrpt 30 ops/ms 77.851 563.969 7.24x Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 67.433 1510.484 22.39x Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 66.614 1511.617 22.69x Long128Vector.rearrange 1024 thrpt 30 ops/ms 77.637 579.021 7.46x Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 69.886 1274.331 18.23x Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 70.069 1273.787 18.17x ``` JMH on jdk mainline: ``` Benchmark (SIZE) Mode Cnt Units Before After Gain SelectFromBenchmark.rearrangeFromDoubleVector 1024 thrpt 30 ops/ms 44.612 1351.850 30.30x SelectFromBenchmark.rearrangeFromDoubleVector 2048 thrpt 30 ops/ms 22.315 676.314 30.31x SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 30 ops/ms 46.372 1502.036 32.39x SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 30 ops/ms 23.361 749.133 32.07x VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 30 ops/ms 97.780 1759.061 17.99x VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 30 ops/ms 48.923 879.584 17.98x VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 30 ops/ms 24.219 439.588 18.15x VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 30 ops/ms 12.416 219.603 17.69x ``` [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L209 [2] https://bugs.openjdk.org/browse/JDK-8310691 ------------- Changes: https://git.openjdk.org/jdk/pull/23790/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23790&range=02 Stats: 421 lines in 6 files changed: 312 ins; 86 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/23790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23790/head:pull/23790 PR: https://git.openjdk.org/jdk/pull/23790 From chagedorn at openjdk.org Fri Mar 14 10:08:04 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 10:08:04 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: <8MH_b7tBCjvP_Z-MkIu8ScAv3bOVC0JTu5QsYExmMYs=.df431d50-110b-4926-9a4b-4b1d5fb3537c@github.com> On Thu, 13 Mar 2025 18:12:00 GMT, Galder Zamarre?o wrote: >> src/hotspot/share/opto/predicates.hpp line 331: >> >>> 329: // the ParsePredicateNode is not marked useless. >>> 330: bool is_valid() const { >>> 331: return _parse_predicate_node != nullptr && !_parse_predicate_node->is_useless(); >> >> Avoids visiting useless Parse Predicates during Predicate iteration. > > When can `_parse_predicate_node` be null? When iterating through predicates, we use a `PredicateBlockIterator` for each Predicate Block, which consists of an optional Parse Predicate and a number of Regular Predicates (Runtime and Assertion Predicates). We could have either already removed the Parse Predicate before here: https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/loopnode.cpp#L5055-L5066 or there is just no Parse Predicate for this loop. So, when initializing the `PredicateBlockIterator`, we could pass here a non-Parse-Predicate projection to the constructor of `ParsePredicate`: https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/predicates.hpp#L747 We then set `_parse_predicate_node` to null here due to this mismatch: https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/predicates.hpp#L293-L296 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1995275721 From roland at openjdk.org Fri Mar 14 10:10:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 14 Mar 2025 10:10:12 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: <_WNbZuoO2hjfrclTBVHy2hkv17-SwHdGELID-dtg58I=.70e29f75-6cb5-4e5b-8b94-493fd1200b85@github.com> References: <_WNbZuoO2hjfrclTBVHy2hkv17-SwHdGELID-dtg58I=.70e29f75-6cb5-4e5b-8b94-493fd1200b85@github.com> Message-ID: On Tue, 18 Feb 2025 16:10:22 GMT, Damon Fenacci wrote: > > Thanks for the report. I can't reproduce it, though. Do you pass any command line options? > > Nothing specific, just a simple `jtreg` command with a debug build and no extra options, i.e. `jtreg -va -jdk:../build/linux-x64-debug/jdk test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyOverflowInBoundChecks.java` on a Intel Xeon machine (with avx512) with Ubuntu. That failure appears to be caused by a bug in `ArrayCopyNode::may_modify()`. That code is used to find if an array copy writes to a particular memory slice. It handles graph shapes for array copy before (`ArrayCopyNode`) and after expansion (`CallNode` + `MemBarNode`). For that it does some pattern matching that breaks when `ArrayOperationPartialInlineSize` is not zero: then, there's an extra `Region` that the current code doesn't expect. Rather than fix the pattern matching, I propose always marking the trailing `MemBar` as is already done in some case. That should make that code more robust (and more conservative). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23465#issuecomment-2724224298 From roland at openjdk.org Fri Mar 14 10:28:13 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 14 Mar 2025 10:28:13 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v13] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 38 commits: - Merge branch 'master' into JDK-8342692 - merge - Merge branch 'master' into JDK-8342692 - Merge branch 'master' into JDK-8342692 - whitespace - Merge branch 'master' into JDK-8342692 - TestMemorySegment test fix - test wip - Merge branch 'master' into JDK-8342692 - refactor - ... and 28 more: https://git.openjdk.org/jdk/compare/4e51a8c9...8877698d ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=12 Stats: 1310 lines in 25 files changed: 1250 ins; 13 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From chagedorn at openjdk.org Fri Mar 14 10:32:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 10:32:01 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v2] In-Reply-To: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: > This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. > > ### Unified Logic to Clean Up Parse and Template Assertion Predicates > We now use the following algorithm: > https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 > > This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: > > https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 > > We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. > > #### Ignoring Useless Parse Predicates > While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. > > #### Sharing Code for Parse and Template Assertion Predicates > With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: > > - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. > - Adding accessor methods to get the Predicate lists from `Compile`. > - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates > > Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. > > ### Other Included Changes > - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could hinder some optimizations. I've added the code now i... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Introduce predicates_enums.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24013/files - new: https://git.openjdk.org/jdk/pull/24013/files/a5611e3e..1508a3be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24013&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24013&range=00-01 Stats: 150 lines in 6 files changed: 82 ins; 58 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24013.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24013/head:pull/24013 PR: https://git.openjdk.org/jdk/pull/24013 From chagedorn at openjdk.org Fri Mar 14 10:32:02 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 10:32:02 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v2] In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Fri, 14 Mar 2025 10:00:05 GMT, Christian Hagedorn wrote: >> Can you add `predicates.inline.hpp` for this? > > Do you mean the enum definition for `PredicateState`? That could work, so I can just include that header when I need to use the enum. Should I then rather name the `hpp` file something like `predicates_enums.hpp`? IIUC, `xyz.inline.hpp` should be used primarily for inline methods. > > Having the separate enum header also allows us to put this one there which makes things cleaner: > https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/predicates.hpp#L203-L211 I've pushed an update introducing `predicates_enums.hpp` and also moving some typedefs for predicates in there. Let me know if that's what you had in mind. I could now move almost all definitions back to the header file except for `mark_useless()` which uses `PhaseIterGVN` which is not a complete type in the header file - I guess that's okay to just move this one to the source file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1995305281 From roland at openjdk.org Fri Mar 14 10:40:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 14 Mar 2025 10:40:54 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v14] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: merge fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/8877698d..34571869 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=12-13 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From rraj at openjdk.org Fri Mar 14 10:55:11 2025 From: rraj at openjdk.org (Rohit Arul Raj) Date: Fri, 14 Mar 2025 10:55:11 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 Message-ID: This change enables optimized SIMD sort for AMD Zen 4 (AVX2) & Zen 5 (AVX512). JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. Attaching ArraySort performance data for Zen4 & Zen5. [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) ------------- Commit messages: - JDK-8317976: Enable optimized SIMD sort for AMD Zen 4 & Zen 5 Changes: https://git.openjdk.org/jdk/pull/24053/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24053&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317976 Stats: 9 lines in 2 files changed: 5 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24053/head:pull/24053 PR: https://git.openjdk.org/jdk/pull/24053 From chagedorn at openjdk.org Fri Mar 14 10:57:04 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 10:57:04 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: <2OJ3-DbdBBiHjZKVDGNXb-IFLqXi0jI4Ezz_zs6NsMA=.a904b56a-a3c4-4f06-a708-70223beccb60@github.com> On Thu, 13 Mar 2025 17:02:00 GMT, Roland Westrelin wrote: > The goal of the patch is to free us from having to worry about Type nodes becoming dead on a cfg path that doesn't fold ever again. That's surely a strong advantage. It would also solve the problem of `CastII` becoming top on the uncommon path of a div by zero check. AFAIR, we have not solved that problem but just worked around it by removing the `CastII::Value()` optimization to improve types on such paths - which was not satisfying but worked. That optimization could be reintroduced as well as a follow-up to this patch. > With this patch, there's no need for assert predicates AFAIU. I've thought about that, too. But I think they are not completely useless: IIUC, if we have a `CastII` that's only used on some branch inside the loop, we would now just make this one branch unreachable with this patch but not clean up entire loop (which is actually dead). Assertion Predicates on the other hand kill the entire loop and prevent us from spending more time trying to optimize it in any way. Another consideration for Assertion Predicates are data updates when splitting loops further that we applied Loop Predication to: We need to ensure that we correctly update all data dependencies on hoisted checks to avoid executing them too early. These data dependencies are hard to keep track of when we want to update them and don't have dedicated nodes where they can always be found at. Template Assertion Predicates solve this because they are only removed once loop opts are over. But if we only need them for keeping track of data dependencies, they can of course become a lot easier and serve as nops. So, I think Assertion Predicates can still be useful as part of Loop Predication and Range Check Elimination. But we have many more places where `Type` nodes could die and break the graph. This patch can indeed solve this in general and I'm definitely in favor of having this patch to solve this problem and future problems with `Type` nodes once and for all. > That's actually the first thing I tried but that doesn't work. A Cast can sometimes be dependent on some condition that was hoisted. What makes the Cast top is both some condition the Cast is control dependent on and the type of its input which can itself be control dependent on some control flow that's dominated by the control of the Cast. I see, makes sense. > Cost in terms of compilation time could be evaluated the usual way. I haven't done any of that but I could. If we agree that we do not see an alternative approach to solve this problems, I think we could accept some increased compilation time. Anyhow, would still be good to get some numbers to have a better picture. > have no solution other than fragile workarounds or new complicated construct that will leak in a lot of parts of the compiler I guess when moving forward with this patch we should still consider how easy it is to keep data and control in sync when adding new `Type` nodes and not just rely on this patch to make things right. We might otherwise miss to remove some other dead nodes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2724311497 From duke at openjdk.org Fri Mar 14 11:39:15 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 14 Mar 2025 11:39:15 GMT Subject: RFR: 8352022: RISC-V: Support Zfa fminm_h/fmaxm_h for float16 Message-ID: For the support of float16, add the Zfa fminm/fmaxm with the type of float16 this is still a draft which need test which related to https://bugs.openjdk.org/browse/JDK-8345298 ------------- Commit messages: - RISCV: support Zfa fminm_h/fmaxm_h with type float16 - RISC-V: support Zfa fminm_h/fmaxm_h with type float16 Changes: https://git.openjdk.org/jdk/pull/24047/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24047&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352022 Stats: 41 lines in 2 files changed: 41 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24047.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24047/head:pull/24047 PR: https://git.openjdk.org/jdk/pull/24047 From galder at openjdk.org Fri Mar 14 11:54:58 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 14 Mar 2025 11:54:58 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 12:07:50 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > Thanks, > Christian Marked as reviewed by galder (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/24031#pullrequestreview-2685278620 From qamai at openjdk.org Fri Mar 14 11:54:58 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 14 Mar 2025 11:54:58 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> Message-ID: On Fri, 14 Mar 2025 09:54:53 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz I disagree, I would expect the `memory_type` of a `StoreC` into a `long[]` to be something that means "a part of a `long[]`", which should be `T_LONG` if the store is guaranteed to be enclosed in a single `long`, or `T_VOID` otherwise. While we are trying to store 2 bytes into the memory, the thing in the memory is neither a `short` nor a `char`. > >> I would expect the `memory_type` of a `StoreC` into a `long[]` to be something that means "a part of a `long[]`" > > If that was the intended meaning of `MemNode::memory_type()`, wouldn't the function be redundant, because we can retrieve that information from `MemNode::adr_type()` already? @robcasloz Yes that's right. Then `MemNode::memory_type()` does not refer to the thing in memory at all, but the thing that is about to interact with the memory. I think: - We should rename it to `MemNode::value_type()` or `MemNode::value_basic_type()` - It is simply incorrect to use it to reason about the thing in the memory in this problem, and using `adr_type` is the correct fix. To be clear, I don't think having `StoreSNode` would solve any issue. I can `StoreS` into a `char[]`, and `StoreC` into a `short[]` and we are back at the same issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2724445592 From galder at openjdk.org Fri Mar 14 11:54:59 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 14 Mar 2025 11:54:59 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: <7WxD6leEC6z_D-jVWI42eWABqEgZynDtWHEAT5cQg8s=.29b27503-378c-484e-a786-6e91727dbb05@github.com> <1kOLqjY6QeYcrHB-OjFKDy9gtmTd80E5jT0XKL416a0=.477c43c7-a69a-4160-85d3-8457c64f4c33@github.com> Message-ID: On Fri, 14 Mar 2025 09:37:45 GMT, Christian Hagedorn wrote: >> I totally agree that there are other places where we have this mismatch - there are actually quite a few. This might require some more effort. I suggest to move forward with this small patch and file a separate RFE for revisiting other uses. What you do you think? > > I guess there are also flags where we have this mismatch between global flag and compile command local flag. Maybe we should do a general pass over all the compile commands and check how we can align them better with their global flag. Yes sounds good ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24031#discussion_r1995417707 From galder at openjdk.org Fri Mar 14 12:05:04 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 14 Mar 2025 12:05:04 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v2] In-Reply-To: <8MH_b7tBCjvP_Z-MkIu8ScAv3bOVC0JTu5QsYExmMYs=.df431d50-110b-4926-9a4b-4b1d5fb3537c@github.com> References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> <8MH_b7tBCjvP_Z-MkIu8ScAv3bOVC0JTu5QsYExmMYs=.df431d50-110b-4926-9a4b-4b1d5fb3537c@github.com> Message-ID: <3tCQN8D1zXp5tlsKvHGdGfC1h93ah968l12KF5wHcKA=.ef2ce1dc-0e51-4303-9fb1-02246ef873e6@github.com> On Fri, 14 Mar 2025 10:05:22 GMT, Christian Hagedorn wrote: >> When can `_parse_predicate_node` be null? > > When iterating through predicates, we use a `PredicateBlockIterator` for each Predicate Block, which consists of an optional Parse Predicate and a number of Regular Predicates (Runtime and Assertion Predicates). We could have either already removed the Parse Predicate before here: > https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/loopnode.cpp#L5055-L5066 > > or there is just no Parse Predicate for this loop. So, when initializing the `PredicateBlockIterator`, we could pass here a non-Parse-Predicate projection to the constructor of `ParsePredicate`: > https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/predicates.hpp#L747 > > We then set `_parse_predicate_node` to null here due to this mismatch: > https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/predicates.hpp#L293-L296 Is the last code snippet the relevant one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1995432535 From eastigeevich at openjdk.org Fri Mar 14 12:06:55 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 14 Mar 2025 12:06:55 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 00:30:07 GMT, Boris Ulasevich wrote: >> Chad Rakoczy has updated the pull request incrementally with three additional commits since the last revision: >> >> - Add PRODUCT check >> - Remove isRelocation flag >> - Replace copy constructor with clone function > > src/hotspot/share/code/nmethod.cpp line 1404: > >> 1402: memcpy(nm_copy, this, size()); >> 1403: >> 1404: // Allocate memory and copy immutable data from C heap > > A new immutable data block is allocated for the new nmethod. Would it be possible to reuse the old one instead? This could help reduce memory allocation overhead, though it would make the logic more complicated. The destroyed nmethod should consider whether it is the last one holding the immutable nmethod blob. Probably, concurrent modification issues could arise. What do you think? > A new immutable data block is allocated for the new nmethod. Would it be possible to reuse the old one instead? ... What do you think? You are reading our minds :) Yes, we have discussed this. This sharing must be thread safe. We need to research whether it won't increase complexity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1995435057 From chagedorn at openjdk.org Fri Mar 14 12:16:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 12:16:08 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 12:07:50 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > Thanks, > Christian Thank you all for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24031#issuecomment-2724504267 From chagedorn at openjdk.org Fri Mar 14 12:16:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 12:16:08 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: <7WxD6leEC6z_D-jVWI42eWABqEgZynDtWHEAT5cQg8s=.29b27503-378c-484e-a786-6e91727dbb05@github.com> <1kOLqjY6QeYcrHB-OjFKDy9gtmTd80E5jT0XKL416a0=.477c43c7-a69a-4160-85d3-8457c64f4c33@github.com> Message-ID: On Fri, 14 Mar 2025 11:52:31 GMT, Galder Zamarre?o wrote: >> I guess there are also flags where we have this mismatch between global flag and compile command local flag. Maybe we should do a general pass over all the compile commands and check how we can align them better with their global flag. > > Yes sounds good I filed [JDK-8352047](https://bugs.openjdk.org/browse/JDK-8352047). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24031#discussion_r1995444435 From chagedorn at openjdk.org Fri Mar 14 12:16:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 12:16:08 GMT Subject: Integrated: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 12:07:50 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > Thanks, > Christian This pull request has now been integrated. Changeset: 65c5282f Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/65c5282f4b83343062571736b7d34ddb147ea39c Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8351938: C2: Print compilation bailouts with PrintCompilation compile command Reviewed-by: epeter, thartmann, kvn, galder ------------- PR: https://git.openjdk.org/jdk/pull/24031 From chagedorn at openjdk.org Fri Mar 14 12:17:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Mar 2025 12:17:52 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v2] In-Reply-To: <3tCQN8D1zXp5tlsKvHGdGfC1h93ah968l12KF5wHcKA=.ef2ce1dc-0e51-4303-9fb1-02246ef873e6@github.com> References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> <8MH_b7tBCjvP_Z-MkIu8ScAv3bOVC0JTu5QsYExmMYs=.df431d50-110b-4926-9a4b-4b1d5fb3537c@github.com> <3tCQN8D1zXp5tlsKvHGdGfC1h93ah968l12KF5wHcKA=.ef2ce1dc-0e51-4303-9fb1-02246ef873e6@github.com> Message-ID: On Fri, 14 Mar 2025 12:02:10 GMT, Galder Zamarre?o wrote: >> When iterating through predicates, we use a `PredicateBlockIterator` for each Predicate Block, which consists of an optional Parse Predicate and a number of Regular Predicates (Runtime and Assertion Predicates). We could have either already removed the Parse Predicate before here: >> https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/loopnode.cpp#L5055-L5066 >> >> or there is just no Parse Predicate for this loop. So, when initializing the `PredicateBlockIterator`, we could pass here a non-Parse-Predicate projection to the constructor of `ParsePredicate`: >> https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/predicates.hpp#L747 >> >> We then set `_parse_predicate_node` to null here due to this mismatch: >> https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/predicates.hpp#L293-L296 > > Is the last code snippet the relevant one? Sorry, I copy-pasted the wrong snippet. The null is coming from here: https://github.com/openjdk/jdk/blob/e3c29c9e6cff7648952c0ba359b0763a0ea8da18/src/hotspot/share/opto/predicates.cpp#L71-L82 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r1995447225 From roland at openjdk.org Fri Mar 14 13:35:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 14 Mar 2025 13:35:55 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: <2OJ3-DbdBBiHjZKVDGNXb-IFLqXi0jI4Ezz_zs6NsMA=.a904b56a-a3c4-4f06-a708-70223beccb60@github.com> References: <2OJ3-DbdBBiHjZKVDGNXb-IFLqXi0jI4Ezz_zs6NsMA=.a904b56a-a3c4-4f06-a708-70223beccb60@github.com> Message-ID: On Fri, 14 Mar 2025 10:51:45 GMT, Christian Hagedorn wrote: > So, I think Assertion Predicates can still be useful as part of Loop Predication and Range Check Elimination. But we have many more places where `Type` nodes could die and break the graph. This patch can indeed solve this in general and I'm definitely in favor of having this patch to solve this problem and future problems with `Type` nodes once and for all. That makes sense and I agree with that. One benefit of predicates (including assert predicates) is that they help narrow type further. So: for (int i = start; i < stop; i++) { v += array[i]; } is transformed by predication into: if (start <0 || start >= array.length) { trap(); } if (stop <0 || stop >= array.length) { trap(); } for (int i = start; i < stop; i++) { v += array[i]; } which the type propagation of 8275202 analyzes as: if (start <0 || start >= array.length) { trap(); } if (stop <0 || stop >= array.length) { trap(); } // start >= 0, stop >= 0 here for (int i = start; i < stop; i++) { // i >= 0 v += array[i]; } Capturing `i >= 0` in the loop `Phi` or array address `CastII` or `ConvI2L` then enables better use of address modes on x86. Except, narrowing the type of the `Phi` or `CastII` expose sC2 to the exact bug this PR tries to fix: what if the loop becomes unreachable but C2 can't fold it away and the `Phi` or `CastII` end up having an out of range input? > I guess when moving forward with this patch we should still consider how easy it is to keep data and control in sync when adding new `Type` nodes and not just rely on this patch to make things right. We might otherwise miss to remove some other dead nodes. For the test case that I added for this bug, the issue is that some `CastII` transformations widen the types of some nodes. I suppose the way to fix this would be to restrict those transformations so widening doesn't happen in some cases. It's going to be tricky (because widening happens so mostly identical `CastII` nodes can be commoned to improve code quality) and fragile (if to preserve performance, we choose to only restrict those transformations to few targeted cases). For 8275202, what I tried doing is that when the new pass proves a condition constant, rather than constant fold the condition, it mark the test as always failing/succeeding (so (If (Bool ...))` is transformed into `(If (Opaque4 (Bool` and the `Opaque4` captures the final result of the `Bool`. Then the `Opaque4` constant folds later. I found several issues with this: - how late is late enough so constant folding the opaque node is safe? In the worst case (and that applies to assert predicates too), as long as a transformation can happen then the `Opaque4` shouldn't be folded. But given folding the `Opaque4` causes more transformations to happen... - how do we prevent this mechanism from leaking elsewhere. Say the condition is in a loop. C2 considered the loop for unrolling but the loop body was too big. With the condition folded, the loop body shrinks and we could unroll. That doesn't work with the `Opaque4` construct. Do we want to modify loop body size computation to account for that? - Another example of the same problem. The condition is a range check. The new pass finds it to be always successful so it changes the condition to be the `Opaque4` node. Now, next round of loop predication would find the condition to be subject to predication. But now the opaque node is in the way. Do we modify loop predication so it can handle those new `Opaque4` conditions? How: is the predicate a condition with an `Opaque4` as well? What about assert predicates? What about range check elimination: what should it do with the `Opaque4` conditions? - Say the condition is the only one in the loop. Removing it would enable vectorization. How does superword deal with the condition that shouldn't be there anymore but is still there? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2724728187 From fyang at openjdk.org Fri Mar 14 13:46:58 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Mar 2025 13:46:58 GMT Subject: RFR: 8352011: RISC-V: Two IR tests fail after JDK-8351662 In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 08:01:52 GMT, SendaoYan wrote: >> Hi, please review this small change fixing two IR tests. >> >> These two tests were enabled for riscv64 by JDK-8351662. But they fail on riscv64 platforms where there is no support for RVV. Since they are expecting vector operations thus requires RVV support on this platform, we should add that as a requirement. >> >> Testing: same tests will be skipped on riscv64 platforms without RVV after this change. Tagging @Hamlin-Li > > LGTM @sendaoYan @Hamlin-Li : Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24048#issuecomment-2724752279 From fyang at openjdk.org Fri Mar 14 13:46:59 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Mar 2025 13:46:59 GMT Subject: Integrated: 8352011: RISC-V: Two IR tests fail after JDK-8351662 In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 07:46:40 GMT, Fei Yang wrote: > Hi, please review this small change fixing two IR tests. > > These two tests were enabled for riscv64 by JDK-8351662. But they fail on riscv64 platforms where there is no support for RVV. Since they are expecting vector operations thus requires RVV support on this platform, we should add that as a requirement. > > Testing: same tests will be skipped on riscv64 platforms without RVV after this change. Tagging @Hamlin-Li This pull request has now been integrated. Changeset: 985ca127 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/985ca1270e8d9bc041e50c2e9dd22bfeb0113e6e Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod 8352011: RISC-V: Two IR tests fail after JDK-8351662 Reviewed-by: syan, mli ------------- PR: https://git.openjdk.org/jdk/pull/24048 From roland at openjdk.org Fri Mar 14 14:11:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 14 Mar 2025 14:11:52 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v2] In-Reply-To: References: <52OYoC5__FdcN8OLwVgdNlb6Fz_IFo8UyKy3GUp5DiM=.708f1ee8-dbbb-4abf-8de0-d94b3b1e2ef6@github.com> Message-ID: On Tue, 18 Feb 2025 12:30:21 GMT, Roland Westrelin wrote: > For this bug, I think a more general fix is to try to compare the type of the `Phi` with that of the input it is going to be replaced with. If the former is not wider than the latter then we add a `CastNode`, since the cast is only about value range, not strict dependency, we can use `CarryDependency` instead of `UnconditionalDependency`. Am I right? There's no `CarryDependency`. Is your question about `RegularDependency` or `StrongDependency`? About the `Phi` type: I'm not sure I understood the comment correctly. Are you suggesting the fix shouldn't be triggered by pre/main/post insertion but rather whenever the test you mention passes? When then? igvn? Or are you suggesting to only apply the fix during pre/main/post insertion when the test you mention passes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23617#issuecomment-2724816338 From thartmann at openjdk.org Fri Mar 14 14:13:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 14 Mar 2025 14:13:54 GMT Subject: RFR: 8350485: C2: factor out common code in Node::grow() and Node::out_grow() [v3] In-Reply-To: <_rgS_qoyZt6mhsO0oEnVEhFw0fYGirVUpvjqFUokkJ8=.f220fbf7-b801-4ceb-a8b8-40a055fe072d@github.com> References: <_rgS_qoyZt6mhsO0oEnVEhFw0fYGirVUpvjqFUokkJ8=.f220fbf7-b801-4ceb-a8b8-40a055fe072d@github.com> Message-ID: On Wed, 12 Mar 2025 10:07:38 GMT, Saranya Natarajan wrote: >> Node:grow() and Node::out_grow() are copy-pasted from each other and their core logic could be factored out into a third function or at least cleaned up. Hence,the fix includes a function Node::array_resize() that implements the core logic of Node::grow() and Node::out_grow(). >> >> Link to Github action which had no failures : https://github.com/sarannat/jdk/actions/runs/13677508359 > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > 8350485: Addressing review comments with code formatting and fixing/removing comments Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23928#pullrequestreview-2685696405 From rcastanedalo at openjdk.org Fri Mar 14 15:08:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 14 Mar 2025 15:08:52 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> Message-ID: On Fri, 14 Mar 2025 09:54:53 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz I disagree, I would expect the `memory_type` of a `StoreC` into a `long[]` to be something that means "a part of a `long[]`", which should be `T_LONG` if the store is guaranteed to be enclosed in a single `long`, or `T_VOID` otherwise. While we are trying to store 2 bytes into the memory, the thing in the memory is neither a `short` nor a `char`. > >> I would expect the `memory_type` of a `StoreC` into a `long[]` to be something that means "a part of a `long[]`" > > If that was the intended meaning of `MemNode::memory_type()`, wouldn't the function be redundant, because we can retrieve that information from `MemNode::adr_type()` already? > @robcasloz Yes that's right. Then `MemNode::memory_type()` does not refer to the thing in memory at all, but the thing that is about to interact with the memory. Yes, that matches my understanding. > * We should rename it to `MemNode::value_type()` or `MemNode::value_basic_type()` I agree, it would be good to do this (in a follow-up RFE). I like `MemNode::value_basic_type()` best. > It is simply incorrect to use it to reason about the thing in the memory in this problem, and using adr_type is the correct fix. > > To be clear, I don't think having `StoreSNode` would solve any issue. I can `StoreS` into a `char[]`, and `StoreC` into a `short[]` and we are back at the same issue. I agree that using `adr_type()` (the solution proposed in this changeset) seems more robust. The alternative of using `memory_type()` and introducing a `StoreS` node assumes for correctness that the array fill optimization does not succeed for mismatched stores such as those you mention (e.g. `StoreS` into a `char[]`). If it did, I agree using `memory_type()` would be incorrect even after introducing a `StoreS` node. But so far, I haven't found any counterexample, i.e. any way to produce an array-filling loop with such a mismatched store that would be accepted by the array fill optimization. My attempts include using memory segments and Unsafe. In all cases, the array fill analysis in `PhaseIdealLoop::match_fill_loop` fails to recognize the loops due to different address computation patterns. Do you have any other idea/suggestion to trigger the array fill optimization using mismatched array stores? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2724980394 From duke at openjdk.org Fri Mar 14 15:11:14 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 14 Mar 2025 15:11:14 GMT Subject: RFR: 8335708: C2: assert(!dead_nodes) failed: using nodes must be reachable from root Message-ID: In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. And thanks to @eme64 who extracted such a concise reproducer. ------------- Commit messages: - Don't start on removed safepoints. - Start detection of useful nodes also at safepoints Changes: https://git.openjdk.org/jdk/pull/23977/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23977&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335708 Stats: 98 lines in 5 files changed: 86 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/23977.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23977/head:pull/23977 PR: https://git.openjdk.org/jdk/pull/23977 From psandoz at openjdk.org Fri Mar 14 17:41:59 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 14 Mar 2025 17:41:59 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 10:48:09 GMT, Rohit Arul Raj wrote: > This change enables optimized SIMD sort for AMD Zen 4 (AVX2) & Zen 5 (AVX512). > > JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. > > Attaching ArraySort performance data for Zen4 & Zen5. > [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) > [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4331: > 4329: snprintf(ebuf_, sizeof(ebuf_), > 4330: ((VM_Version::is_intel() || (VM_Version::is_amd() && (VM_Version::cpu_family() > 0x19))) > 4331: && VM_Version::supports_avx512dq()) ? "avx512_sort" : "avx2_sort"); Perhaps factor the expression to a separate method rather than repeat it three times? Can we add some constant with a descriptive name for the CPU family rather than directly using 0x19? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r1996006886 From eastigeevich at openjdk.org Fri Mar 14 18:34:54 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 14 Mar 2025 18:34:54 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 22:23:23 GMT, Evgeny Astigeevich wrote: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... AMD 4th Gen EPYC (Genoa) results |activeMethodCount |groupCount |Methods/Group |Score |Error |Diff | |--- |--- |--- |--- |--- |--- | |128 |1 |128 |14.71 |0.042 | | |128 |32 |4 |19.381 |0.04 |31.80% | |128 |48 |3 |19.998 |0.099 |35.90% | |128 |64 |2 |20.965 |0.097 |42.50% | |128 |80 |2 |20.988 |0.121 |42.70% | |128 |96 |1 |21.442 |0.141 |45.80% | |128 |112 |1 |20.985 |0.05 |42.70% | |256 |1 |256 |31.282 |0.072 | | |256 |32 |8 |41.516 |0.252 |32.70% | |256 |48 |5 |43.29 |0.45 |38.40% | |256 |64 |4 |45.71 |0.321 |46.10% | |256 |80 |3 |45.682 |0.325 |46% | |256 |96 |3 |47.03 |0.168 |50.30% | |256 |112 |2 |47.761 |0.609 |52.70% | |512 |1 |512 |69.02 |0.742 | | |512 |32 |16 |97.437 |0.9 |41.20% | |512 |48 |11 |97.472 |0.481 |41.20% | |512 |64 |8 |102.799 |0.519 |48.90% | |512 |80 |6 |104.942 |0.278 |52% | |512 |96 |5 |106.29 |0.182 |54% | |512 |112 |5 |109.224 |0.49 |58.20% | |768 |1 |768 |114.981 |1.51 | | |768 |32 |24 |155.305 |0.995 |35.10% | |768 |48 |16 |155.688 |0.362 |35.40% | |768 |64 |12 |158.123 |0.443 |37.50% | |768 |80 |10 |160.181 |0.879 |39.30% | |768 |96 |8 |162.661 |0.177 |41.50% | |768 |112 |7 |164.742 |0.342 |43.30% | |1024 |1 |1024 |175.37 |1.244 | | |1024 |32 |32 |206.198 |1.131 |17.60% | |1024 |48 |21 |206.476 |1.19 |17.70% | |1024 |64 |16 |211.615 |0.654 |20.70% | |1024 |80 |13 |212.683 |0.928 |21.30% | |1024 |96 |11 |214.103 |0.432 |22.10% | |1024 |112 |9 |217.517 |1.197 |24% | Intel 4th Gen Xeon (Sapphire Rapids) results |activeMethodCount |groupCount |Methods/Group |Score |Error |Diff | |--- |--- |--- |--- |--- |--- | |128 |1 |128 |12.68 |0.01 | | |128 |32 |4 |15.61 |0.04 |23.1% | |128 |48 |3 |15.75 |0.05 |24.2% | |128 |64 |2 |16.02 |0.11 |26.4% | |128 |80 |2 |16.21 |0.12 |27.9% | |128 |96 |1 |16.48 |0.27 |30.0% | |128 |112 |1 |17.12 |0.59 |35.1% | |256 |1 |256 |25.21 |0.15 | | |256 |32 |8 |31.73 |0.35 |25.9% | |256 |48 |5 |31.74 |0.37 |25.9% | |256 |64 |4 |33.56 |0.59 |33.1% | |256 |80 |3 |33.62 |0.91 |33.3% | |256 |96 |3 |34.46 |0.92 |36.7% | |256 |112 |2 |34.92 |0.99 |38.5% | |512 |1 |512 |58.05 |0.96 | | |512 |32 |16 |69.60 |1.59 |19.9% | |512 |48 |11 |70.61 |1.11 |21.6% | |512 |64 |8 |75.67 |1.25 |30.4% | |512 |80 |6 |77.70 |1.59 |33.9% | |512 |96 |5 |79.04 |1.29 |36.2% | |512 |112 |5 |80.09 |0.92 |38.0% | |768 |1 |768 |112.73 |1.66 | | |768 |32 |24 |135.95 |4.22 |20.6% | |768 |48 |16 |137.05 |2.00 |21.6% | |768 |64 |12 |136.82 |2.06 |21.4% | |768 |80 |10 |144.65 |5.60 |28.3% | |768 |96 |8 |148.26 |6.11 |31.5% | |768 |112 |7 |152.97 |5.36 |35.7% | |1024 |1 |1024 |165.65 |3.10 | | |1024 |32 |32 |209.07 |4.72 |26.2% | |1024 |48 |21 |214.42 |4.14 |29.4% | |1024 |64 |16 |219.80 |4.28 |32.7% | |1024 |80 |13 |224.82 |4.11 |35.7% | |1024 |96 |11 |230.94 |2.56 |39.4% | |1024 |112 |9 |234.45 |3.49 |41.5% | ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2725396685 PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2725427533 From duke at openjdk.org Fri Mar 14 20:49:42 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 14 Mar 2025 20:49:42 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: Message-ID: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Share immutable data between copied nmethods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/69d941b4..80734cd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=01-02 Stats: 32 lines in 2 files changed: 19 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Fri Mar 14 20:49:42 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 14 Mar 2025 20:49:42 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v2] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 12:04:29 GMT, Evgeny Astigeevich wrote: >> src/hotspot/share/code/nmethod.cpp line 1404: >> >>> 1402: memcpy(nm_copy, this, size()); >>> 1403: >>> 1404: // Allocate memory and copy immutable data from C heap >> >> A new immutable data block is allocated for the new nmethod. Would it be possible to reuse the old one instead? This could help reduce memory allocation overhead, though it would make the logic more complicated. The destroyed nmethod should consider whether it is the last one holding the immutable nmethod blob. Probably, concurrent modification issues could arise. What do you think? > >> A new immutable data block is allocated for the new nmethod. Would it be possible to reuse the old one instead? ... What do you think? > > You are reading our minds :) > Yes, we have discussed this. This sharing must be thread safe. We need to research whether it won't increase complexity. I added a reference counter to the end of immutable data. The only time the value is accessed/updated is during nmethod creation or destruction both of which hold the `CodeCache_lock` so I believe concurrent modifications should not be an issue ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1996254997 From duke at openjdk.org Fri Mar 14 21:57:21 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 14 Mar 2025 21:57:21 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Fix build issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/80734cd3..7b448c6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=02-03 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From dlong at openjdk.org Fri Mar 14 22:05:59 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 14 Mar 2025 22:05:59 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> Message-ID: On Fri, 14 Mar 2025 20:49:42 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Share immutable data between copied nmethods src/hotspot/share/code/nmethod.hpp line 587: > 585: address immutable_data_references_begin () const { return _immutable_data + _immutable_data_references_offset ; } > 586: address immutable_data_references_end () const { return immutable_data_end(); } > 587: If we are going to add typed fields to this data, maybe we should put it in a struct/class header at the beginning so we can access the field directly? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1996341389 From kvn at openjdk.org Sat Mar 15 00:00:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 15 Mar 2025 00:00:53 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v2] In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: <55-EqzBO8KU7Euo3or9B-D-2Pl0HH_jAbFR7taF1iZo=.01bde00a-f5f4-41b8-a091-942ef00449ad@github.com> On Fri, 14 Mar 2025 10:32:01 GMT, Christian Hagedorn wrote: >> This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. >> >> ### Unified Logic to Clean Up Parse and Template Assertion Predicates >> We now use the following algorithm: >> https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 >> >> This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: >> >> https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 >> >> We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. >> >> #### Ignoring Useless Parse Predicates >> While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. >> >> #### Sharing Code for Parse and Template Assertion Predicates >> With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: >> >> - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. >> - Adding accessor methods to get the Predicate lists from `Compile`. >> - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates >> >> Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. >> >> ### Other Included Changes >> - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could ... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Introduce predicates_enums.hpp This looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24013#pullrequestreview-2687097149 From sviswanathan at openjdk.org Sat Mar 15 00:56:25 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 15 Mar 2025 00:56:25 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v3] In-Reply-To: References: Message-ID: > Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. > > Review comments are welcome. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Review comments from Emanuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23939/files - new: https://git.openjdk.org/jdk/pull/23939/files/70ab0acc..8069f4b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23939&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23939&range=01-02 Stats: 24 lines in 1 file changed: 18 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23939.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23939/head:pull/23939 PR: https://git.openjdk.org/jdk/pull/23939 From sviswanathan at openjdk.org Sat Mar 15 00:56:25 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 15 Mar 2025 00:56:25 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v3] In-Reply-To: References: <1TIUz8O4xjCwMArQYWlPJ4qRR9SBVpux0cceH9m2X5k=.521532a4-9ea7-4031-aa98-a60ce2c8982a@github.com> Message-ID: On Mon, 10 Mar 2025 10:13:55 GMT, Emanuel Peter wrote: >> Thanks a lot @vnkozlov for the review and approval. > > @sviswa7 thanks for looking at this! The fix looks good, there are just a few comments about the test :) @eme64 @jatin-bhateja Your review comments are handled. Please take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23939#issuecomment-2726075326 From sviswanathan at openjdk.org Sat Mar 15 00:56:25 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 15 Mar 2025 00:56:25 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: <_cQR4s9TEmTN7kEl88euP4PIneD_syVmCCvaz82Exf4=.f2f008e5-186d-4844-9e6d-950d01dd1b9b@github.com> References: <_cQR4s9TEmTN7kEl88euP4PIneD_syVmCCvaz82Exf4=.f2f008e5-186d-4844-9e6d-950d01dd1b9b@github.com> Message-ID: On Wed, 12 Mar 2025 14:15:56 GMT, Sandhya Viswanathan wrote: >> test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 78: >> >>> 76: @IR(counts = {IRNode.VECTOR_CAST_HF2F, "> 0"}, >>> 77: applyIfOr = {"UseCompactObjectHeaders", "false", "AlignVector", "false"}, >>> 78: applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}, >> >> Can you kindly justify the need for compressed object header usage, it will mainly impact the pre-loop trip count compuation. AlignVector should be sufficient since it's a whitelisted option > > This check is taken from compiler/vectorization/TestFloatConversionsVector.java which also has float16 conversion tests to be in sync. If I remove UseCompressedHeaders check then the test will start failing for folks working on compressed headers so good to keep it there and as I mentioned before it is good to be in sync with other Float16ToFloat conversion test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1996470864 From sviswanathan at openjdk.org Sat Mar 15 00:56:25 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 15 Mar 2025 00:56:25 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v2] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 07:37:17 GMT, Emanuel Peter wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > test/hotspot/jtreg/compiler/vectorization/TestFloat16ToFloatConv.java line 113: > >> 111: >> 112: @Test >> 113: // Not vectorized due to JDK-8350835 > > That's very non-descriptive. Actually, that is the current bug, so this is not even a future RFE that intends to fix it. > Can you please say why it is not vectorizing now, and what might be possible conditions when it would be ok to vectorize in the future? Could we even file an RFE for this? Added descriptive comments. RFE filed at https://bugs.openjdk.org/browse/JDK-8352093. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23939#discussion_r1996470360 From kvn at openjdk.org Sat Mar 15 00:57:58 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 15 Mar 2025 00:57:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 21:57:21 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issues Few new comments. make/hotspot/lib/CompileJvm.gmk line 201: > 199: DISABLED_WARNINGS_gcc_jvmtiTagMap.cpp := stringop-overflow, \ > 200: DISABLED_WARNINGS_gcc_macroAssembler_ppc_sha.cpp := unused-const-variable, \ > 201: DISABLED_WARNINGS_gcc_nmethod.cpp := class-memaccess, \ Why you need this? src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 93: > 91: void trampoline_stub_Relocation::pd_fix_owner_after_move() { > 92: NativeCall* call = nativeCall_at(owner()); > 93: assert(call->raw_destination() == owner(), "destination should be empty"); Why it was removed? src/hotspot/share/code/nmethod.cpp line 1407: > 1405: // Increment number of references to immutable data to share it between nmethods > 1406: if (immutable_data_size() > 0) { > 1407: (*immutable_data_references_begin())++; Please use nmethod's inline function to update counter which use cast to `int`. src/hotspot/share/code/nmethod.cpp line 1709: > 1707: } > 1708: #endif > 1709: memset(immutable_data_references_begin(), 1, oopSize); Don't use `memset`, use nmethod's inline function to set counter which use correct cast to `int`. src/hotspot/share/code/nmethod.cpp line 2289: > 2287: > 2288: if (_immutable_data != blob_end()) { > 2289: long _immutable_data_references = *immutable_data_references_begin(); Why `long`? It should be int for all platforms (we still have 32-bit arm). Check the current value and update it should be done by nmethod's inline functions. src/hotspot/share/code/nmethod.hpp line 253: > 251: int _speculations_offset; > 252: #endif > 253: int _immutable_data_references_offset; Do we really need this? Can we simply use `_immutable_data_end - sizeof(int)` and make sure that `_immutable_data_end` has big enough alignment (which you have already) src/hotspot/share/code/nmethod.hpp line 338: > 336: ); > 337: > 338: nmethod(nmethod& nm); No need this ------------- PR Review: https://git.openjdk.org/jdk/pull/23573#pullrequestreview-2687109649 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1996454101 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1996454602 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1996469711 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1996470327 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1996471618 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1996469194 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1996461584 From kvn at openjdk.org Sat Mar 15 00:57:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 15 Mar 2025 00:57:59 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 19:01:12 GMT, Vladimir Kozlov wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build issues > > src/hotspot/share/code/nmethod.cpp line 1399: > >> 1397: { >> 1398: debug_only(NoSafepointVerifier nsv;) >> 1399: assert_locked_or_safepoint(CodeCache_lock); > > Is this lock enough to prevent GC scan it before you finish initializing it? My question is related to `_state` field value. During usual nmethod creation the `_state` is `not_installed`. nmethod you are coping has `in_use` state. Someone may see this state before all fields are set. That is why I am asking if `CodeCache_lock` prevents any other VM's threads see it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1996458756 From kvn at openjdk.org Sat Mar 15 00:58:00 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 15 Mar 2025 00:58:00 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> Message-ID: On Fri, 14 Mar 2025 22:03:44 GMT, Dean Long wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Share immutable data between copied nmethods > > src/hotspot/share/code/nmethod.hpp line 587: > >> 585: address immutable_data_references_begin () const { return _immutable_data + _immutable_data_references_offset ; } >> 586: address immutable_data_references_end () const { return immutable_data_end(); } >> 587: > > If we are going to add typed fields to this data, maybe we should put it in a struct/class header at the beginning so we can access the field directly? I am not sure adding one integers field justifies new structure. It is better to keep this counter at the end of this block since it is accessed only few times. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1996464592 From dlong at openjdk.org Sat Mar 15 01:05:59 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 15 Mar 2025 01:05:59 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> Message-ID: On Wed, 12 Mar 2025 20:38:25 GMT, Dean Long wrote: >>> Do you think those high values are not expected ? >> >> Sorry, my mistake. As @dean-long pointed out they are to be expected with very small values of `target->_freq` > > I think it would still be helpful to understand what kind of situations cause these extreme values. I know there are places where we have to adjust for problematic 0 counts, so I'm wondering if something like that is happening here. Yes, CFGLoop::scale_freq() is turning a 0 _freq value into MIN_BLOCK_FREQUENCY, which is 1.e-35f. Dividing by such a small number can overflow a 32-bit int. Maybe this is a never-taken out edge of an infinite loop? It might be a bug to give this edge an effectively infinite frequency percentage. This will cause CFGEdge::to_infrequent() to report false, when maybe it should return true. I don't understand this code well enough to decide. Maybe a loop expert can tell us if having this frequency overflow here is harmless or not. Tagging @rwestrel and @TobiHartmann ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1996475459 From dlong at openjdk.org Sat Mar 15 04:01:58 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 15 Mar 2025 04:01:58 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> Message-ID: On Sat, 15 Mar 2025 01:03:39 GMT, Dean Long wrote: >> I think it would still be helpful to understand what kind of situations cause these extreme values. I know there are places where we have to adjust for problematic 0 counts, so I'm wondering if something like that is happening here. > > Yes, CFGLoop::scale_freq() is turning a 0 _freq value into MIN_BLOCK_FREQUENCY, which is 1.e-35f. Dividing by such a small number can overflow a 32-bit int. Maybe this is a never-taken out edge of an infinite loop? It might be a bug to give this edge an effectively infinite frequency percentage. This will cause CFGEdge::to_infrequent() to report false, when maybe it should return true. I don't understand this code well enough to decide. Maybe a loop expert can tell us if having this frequency overflow here is harmless or not. Tagging @rwestrel and @TobiHartmann This code seems to be really old, from https://bugs.openjdk.org/browse/JDK-6743900. Tagging reviewers @tkrodriguez and @vnkozlov . To me, the formula for `to_pct` looks wrong. I would expect `b->_freq` and `target->_freq `to be multiplied together, not divided. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1996561783 From syan at openjdk.org Sun Mar 16 03:21:11 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 16 Mar 2025 03:21:11 GMT Subject: RFR: 8351938: C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 12:07:50 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > Thanks, > Christian Hi, this PR seems make same tests intermittent [fails](https://bugs.openjdk.org/browse/JDK-8352108). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24031#issuecomment-2727160425 From syan at openjdk.org Sun Mar 16 09:06:26 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 16 Mar 2025 09:06:26 GMT Subject: RFR: 8352108: "COMPILE SKIPPED: concurrent class loading" cause tests intermittent fails Message-ID: <00d2ViOx7EtrXWue7zG_ucZ5570efYP8ItX1c2kAUXI=.2eaa1414-a62a-428b-a852-3a3d2055afac@github.com> Hi all, After [JDK-8351938](https://bugs.openjdk.org/browse/JDK-8351938) several tests intermittent fails, because JVM print additional message "COMPILE SKIPPED: concurrent class loading". I don't know why `task->directive()->PrintCompilationOption` set as `true` automatically, but add an extra check for that works for me. ------------- Commit messages: - 8352108: "COMPILE SKIPPED: concurrent class loading" cause tests intermittent fails Changes: https://git.openjdk.org/jdk/pull/24073/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24073&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352108 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24073.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24073/head:pull/24073 PR: https://git.openjdk.org/jdk/pull/24073 From syan at openjdk.org Sun Mar 16 15:27:37 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 16 Mar 2025 15:27:37 GMT Subject: RFR: 8352108: "COMPILE SKIPPED: concurrent class loading" cause tests intermittent fails [v2] In-Reply-To: <00d2ViOx7EtrXWue7zG_ucZ5570efYP8ItX1c2kAUXI=.2eaa1414-a62a-428b-a852-3a3d2055afac@github.com> References: <00d2ViOx7EtrXWue7zG_ucZ5570efYP8ItX1c2kAUXI=.2eaa1414-a62a-428b-a852-3a3d2055afac@github.com> Message-ID: > Hi all, > > After [JDK-8351938](https://bugs.openjdk.org/browse/JDK-8351938) several tests intermittent fails, because JVM print additional message "COMPILE SKIPPED: concurrent class loading". > > I don't know why `task->directive()->PrintCompilationOption` set as `true` automatically, but add an extra check for that works for me. SendaoYan has updated the pull request incrementally with one additional commit since the last revision: Set PrintCompilation to true when receive CompileCommandEnum::PrintCompilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24073/files - new: https://git.openjdk.org/jdk/pull/24073/files/2b3160b6..55bf15fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24073&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24073&range=00-01 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24073.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24073/head:pull/24073 PR: https://git.openjdk.org/jdk/pull/24073 From kvn at openjdk.org Sun Mar 16 19:20:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 16 Mar 2025 19:20:55 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> Message-ID: On Sat, 15 Mar 2025 03:59:07 GMT, Dean Long wrote: >> Yes, CFGLoop::scale_freq() is turning a 0 _freq value into MIN_BLOCK_FREQUENCY, which is 1.e-35f. Dividing by such a small number can overflow a 32-bit int. Maybe this is a never-taken out edge of an infinite loop? It might be a bug to give this edge an effectively infinite frequency percentage. This will cause CFGEdge::to_infrequent() to report false, when maybe it should return true. I don't understand this code well enough to decide. Maybe a loop expert can tell us if having this frequency overflow here is harmless or not. Tagging @rwestrel and @TobiHartmann > > This code seems to be really old, from https://bugs.openjdk.org/browse/JDK-6743900. Tagging reviewers @tkrodriguez and @vnkozlov . To me, the formula for `to_pct` looks wrong. I would expect `b->_freq` and `target->_freq `to be multiplied together, not divided. Block::_freq is number of times this block is executed per each call of the method. It could be big number for blocks in loop and very small on not frequent path. `succ_prob()` is calculated based on frequencies of two block and/or corresponding branch probability: [gcm.cpp#L2100](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L2100) `freq = b->_freq * b->succ_prob(j)` is number of times we take this outgoing path. So to calculate probability of taking this path in `target` block we divide `freq` on number of times `target` block is executed. This assumes that `b->_freq` <= `target->freq`. Which seems not true in this case and indicate a bug in how we calculate and update blocks frequencies. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1997687585 From duke at openjdk.org Sun Mar 16 23:48:53 2025 From: duke at openjdk.org (Abdelhak Zaaim) Date: Sun, 16 Mar 2025 23:48:53 GMT Subject: RFR: 8352110: [ASAN] heap-use-after-free of task->directive()->PrintCompilationOption [v2] In-Reply-To: References: <00d2ViOx7EtrXWue7zG_ucZ5570efYP8ItX1c2kAUXI=.2eaa1414-a62a-428b-a852-3a3d2055afac@github.com> Message-ID: <4EqGvTJPGhG1nWw5vDAmDpSizOI-Kg4DZ0Bb6Dp1XtY=.e60eb49b-d539-4aa4-b65c-214f8448eac3@github.com> On Sun, 16 Mar 2025 15:27:37 GMT, SendaoYan wrote: >> Hi all, >> >> After [JDK-8351938](https://bugs.openjdk.org/browse/JDK-8351938) several tests intermittent fails, because JVM print additional message "COMPILE SKIPPED: concurrent class loading", and AddressSanitizer report 'heap-use-after-free' error. >> >> I don't know why `task->directive()->PrintCompilationOption` set as `true` automatically. This PR remove `task->directive()->PrintCompilationOption` usage, and set PrintCompilation to true when receive CompileCommandEnum::PrintCompilation. > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > Set PrintCompilation to true when receive CompileCommandEnum::PrintCompilation Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/24073#pullrequestreview-2688863905 From xgong at openjdk.org Mon Mar 17 01:18:01 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 17 Mar 2025 01:18:01 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 09:28:58 GMT, Emanuel Peter wrote: >> The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. >> >> Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. >> >> This patch added the rearrange support for vector types with small lane count. Here are the main changes: >> - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) >> - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation >> - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one >> >> Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. >> >> 1) NEON >> >> JMH on panama-vector:vectorIntrinsics: >> >> Benchmark (size) Mode Cnt Units Before After Gain >> Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x >> Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x >> Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x >> Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x >> Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x >> Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x >> Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x >> Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x >> Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x >> Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x >> Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x >> Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.... > > But the testing on my side so far looks good. I'll rerun once you add your IR tests. Hi @eme64 , the IR test has been added. Could you please help test it one more time? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2727790383 From fyang at openjdk.org Mon Mar 17 01:37:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Mar 2025 01:37:51 GMT Subject: RFR: 8352022: RISC-V: Support Zfa fminm_h/fmaxm_h for float16 In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 06:57:29 GMT, Anjian-Wen wrote: > For the support of float16, add the Zfa fminm/fmaxm with the type of float16 > this is still a draft which need test which related to https://bugs.openjdk.org/browse/JDK-8345298 Looks fine. My local hs:tier1-hs:tier2 test result is good (Qemu-system with UseZfh & UseZfa options enabled by default). ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24047#pullrequestreview-2688941444 From duke at openjdk.org Mon Mar 17 02:15:51 2025 From: duke at openjdk.org (Anjian-Wen) Date: Mon, 17 Mar 2025 02:15:51 GMT Subject: RFR: 8352022: RISC-V: Support Zfa fminm_h/fmaxm_h for float16 In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 01:35:06 GMT, Fei Yang wrote: >> For the support of float16, add the Zfa fminm/fmaxm with the type of float16 >> this related to https://bugs.openjdk.org/browse/JDK-8345298 > > Looks fine. My local hs:tier1-hs:tier2 test result is good (Qemu-system with UseZfh & UseZfa options enabled by default). @RealFYang Thanks for your review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24047#issuecomment-2727853776 From thartmann at openjdk.org Mon Mar 17 06:11:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 17 Mar 2025 06:11:52 GMT Subject: RFR: 8352110: [ASAN] heap-use-after-free of task->directive()->PrintCompilationOption [v2] In-Reply-To: References: <00d2ViOx7EtrXWue7zG_ucZ5570efYP8ItX1c2kAUXI=.2eaa1414-a62a-428b-a852-3a3d2055afac@github.com> Message-ID: On Sun, 16 Mar 2025 15:27:37 GMT, SendaoYan wrote: >> Hi all, >> >> After [JDK-8351938](https://bugs.openjdk.org/browse/JDK-8351938) several tests intermittent fails, because JVM print additional message "COMPILE SKIPPED: concurrent class loading", and AddressSanitizer report 'heap-use-after-free' error. >> >> I don't know why `task->directive()->PrintCompilationOption` set as `true` automatically. This PR remove `task->directive()->PrintCompilationOption` usage, and set PrintCompilation to true when receive CompileCommandEnum::PrintCompilation. > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > Set PrintCompilation to true when receive CompileCommandEnum::PrintCompilation Changes requested by thartmann (Reviewer). src/hotspot/share/compiler/compilerOracle.cpp line 338: > 336: > 337: if (option == CompileCommandEnum::PrintCompilation) { > 338: PrintCompilation = true; But this will set `PrintCompilation` to true globally, right? The directive should only apply to a specific method. ------------- PR Review: https://git.openjdk.org/jdk/pull/24073#pullrequestreview-2689211832 PR Review Comment: https://git.openjdk.org/jdk/pull/24073#discussion_r1997995025 From xgong at openjdk.org Mon Mar 17 06:36:53 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 17 Mar 2025 06:36:53 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v3] In-Reply-To: References: Message-ID: <0g7Bcc5-ym31IxFP8zAQPpjXJYOoFj3g9ZZ-7VtSFeY=.fb99a88c-7b98-4ef2-8438-421b43216d76@github.com> On Fri, 14 Mar 2025 09:34:47 GMT, Xiaohong Gong wrote: >> Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures. >> >> The performance of Vector API relative JMH micro benchmarks can improve about 70x ~ 95x on a NVIDIA Grace CPU, which is a 128-bit vector length sve2 architecture, with different UseSVE options. Here is the gain details: >> >> >> Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 >> ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x >> ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x >> ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x >> ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x >> ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x >> ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x >> ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x >> ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x >> ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x >> ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x >> ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x >> ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x >> IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x >> IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x >> IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x >> IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x >> IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x >> IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x >> IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x >> IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x >> IntMaxVector.UMAX 1024 thrpt 30 82.40x 81.85x 82.242x >> IntMaxVector.UMAXMasked 1024 thrpt 30 90.30x 92.10x 92.587x >> IntMaxVector.UMIN 1024 thrpt 30 82.84x 81.43x 82.801x >> IntMaxVector.UMINMasked 1024 thrpt 30 90.43x 91.49x 92.678x >> LongMaxVector.SADD 102... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Fix IR test failure on X64 with UseAVX=1 Have checked that the test failure on linux-64 is not caused by this PR. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2728341963 From chagedorn at openjdk.org Mon Mar 17 06:51:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 17 Mar 2025 06:51:59 GMT Subject: RFR: 8352110: [ASAN] heap-use-after-free of task->directive()->PrintCompilationOption [v2] In-Reply-To: References: <00d2ViOx7EtrXWue7zG_ucZ5570efYP8ItX1c2kAUXI=.2eaa1414-a62a-428b-a852-3a3d2055afac@github.com> Message-ID: On Mon, 17 Mar 2025 06:08:50 GMT, Tobias Hartmann wrote: >> SendaoYan has updated the pull request incrementally with one additional commit since the last revision: >> >> Set PrintCompilation to true when receive CompileCommandEnum::PrintCompilation > > src/hotspot/share/compiler/compilerOracle.cpp line 338: > >> 336: >> 337: if (option == CompileCommandEnum::PrintCompilation) { >> 338: PrintCompilation = true; > > But this will set `PrintCompilation` to true globally, right? The directive should only apply to a specific method. Yes, we only want to have it enabled for certain methods. Let's do a backout of JDK-8351938 instead to address the problems properly in a REDO. https://github.com/openjdk/jdk/pull/24074 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24073#discussion_r1998045288 From chagedorn at openjdk.org Mon Mar 17 06:52:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 17 Mar 2025 06:52:51 GMT Subject: RFR: 8352110: [BACKOUT] C2: Print compilation bailouts with PrintCompilation compile command Message-ID: Due to some failures, I think it's best to backout this and address the problems properly in a REDO. Thanks, Christian ------------- Commit messages: - Revert "8351938: C2: Print compilation bailouts with PrintCompilation compile command" Changes: https://git.openjdk.org/jdk/pull/24074/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24074&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352110 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24074.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24074/head:pull/24074 PR: https://git.openjdk.org/jdk/pull/24074 From epeter at openjdk.org Mon Mar 17 06:52:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Mar 2025 06:52:54 GMT Subject: RFR: 8335708: C2: assert(!dead_nodes) failed: using nodes must be reachable from root In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 09:12:44 GMT, Marc Chevalier wrote: > In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: > https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 > > Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. > > About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. > > And thanks to @eme64 who extracted such a concise reproducer. Nice work, and thanks again for the offline discussion :) I left a few comments / suggestions below. Ah, and you could also consider changing the PR name. Maybe something like `graph verification must start at root and safepoints, just like CCP traversal`. Maybe you have an even better idea ;) src/hotspot/share/opto/compile.cpp line 4206: > 4204: uint stack_size = live_nodes() >> 4; > 4205: Node_List nstack(MAX2(stack_size, (uint) OptoNodeListSize)); > 4206: if (root_and_safepoints != nullptr) { Can you say in which cases we don't have `root_and_safepoints`? Why is it ok not to also start at SafePoint in those cases? test/hotspot/jtreg/compiler/loopopts/Test8335708.java line 1: > 1: /* Can you rename the test file? I think the new common practice is to give it a descriptive name, rather than just the bug number which is already tracked under `@bug 8335708` anyway ;) test/hotspot/jtreg/compiler/loopopts/Test8335708.java line 28: > 26: * @bug 8335708 > 27: * @summary Crash Compile::verify_graph_edges > 28: * @requires vm.debug == true & vm.flavor == "server" Can we find a way not to have this restriction? It could make sense to still execute this in product, or with other compilers. If the issues is with vm flags, then you can always use `-XX:+IgnoreUnrecognizedVMOptions`. test/hotspot/jtreg/compiler/loopopts/Test8335708.java line 29: > 27: * @summary Crash Compile::verify_graph_edges > 28: * @requires vm.debug == true & vm.flavor == "server" > 29: * @library /test/lib Suggestion: Can you test if you actually need this line? test/hotspot/jtreg/compiler/loopopts/Test8335708.java line 35: > 33: * -XX:+StressIGVN -Xcomp > 34: * -XX:CompileCommand=compileonly,compiler.loopopts.Test8335708::mainTest > 35: * compiler.loopopts.Test8335708 Can you please add a run without any flags? Sometimes that allows other bugs to trigger, because it can then be used without any flags, or other flag combinations. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23977#pullrequestreview-2689275942 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r1998041679 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r1998036529 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r1998034747 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r1998038170 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r1998035486 From thartmann at openjdk.org Mon Mar 17 06:52:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 17 Mar 2025 06:52:52 GMT Subject: RFR: 8352110: [BACKOUT] C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 06:47:08 GMT, Christian Hagedorn wrote: > Due to some failures, I think it's best to backout this and address the problems properly in a REDO. > > Thanks, > Christian Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24074#pullrequestreview-2689291781 From epeter at openjdk.org Mon Mar 17 06:52:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Mar 2025 06:52:54 GMT Subject: RFR: 8335708: C2: assert(!dead_nodes) failed: using nodes must be reachable from root In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 06:45:13 GMT, Emanuel Peter wrote: >> In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: >> https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 >> >> Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. >> >> About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. >> >> And thanks to @eme64 who extracted such a concise reproducer. > > src/hotspot/share/opto/compile.cpp line 4206: > >> 4204: uint stack_size = live_nodes() >> 4; >> 4205: Node_List nstack(MAX2(stack_size, (uint) OptoNodeListSize)); >> 4206: if (root_and_safepoints != nullptr) { > > Can you say in which cases we don't have `root_and_safepoints`? Why is it ok not to also start at SafePoint in those cases? I think you should also say that we start the traversal from Root and Safepoints, just like during CCP. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r1998043895 From syan at openjdk.org Mon Mar 17 07:01:59 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 17 Mar 2025 07:01:59 GMT Subject: RFR: 8352110: [BACKOUT] C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 06:47:08 GMT, Christian Hagedorn wrote: > Due to some failures, I think it's best to backout this and address the problems properly in a REDO. > > Thanks, > Christian Marked as reviewed by syan (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24074#pullrequestreview-2689312788 From syan at openjdk.org Mon Mar 17 07:05:09 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 17 Mar 2025 07:05:09 GMT Subject: RFR: 8352110: [ASAN] heap-use-after-free of task->directive()->PrintCompilationOption [v2] In-Reply-To: References: <00d2ViOx7EtrXWue7zG_ucZ5570efYP8ItX1c2kAUXI=.2eaa1414-a62a-428b-a852-3a3d2055afac@github.com> Message-ID: On Mon, 17 Mar 2025 06:48:51 GMT, Christian Hagedorn wrote: >> src/hotspot/share/compiler/compilerOracle.cpp line 338: >> >>> 336: >>> 337: if (option == CompileCommandEnum::PrintCompilation) { >>> 338: PrintCompilation = true; >> >> But this will set `PrintCompilation` to true globally, right? The directive should only apply to a specific method. > > Yes, we only want to have it enabled for certain methods. Let's do a backout of JDK-8351938 instead to address the problems properly in a REDO. > > https://github.com/openjdk/jdk/pull/24074 Okey, I think we should close this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24073#discussion_r1998060638 From syan at openjdk.org Mon Mar 17 07:05:09 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 17 Mar 2025 07:05:09 GMT Subject: Withdrawn: 8352110: [ASAN] heap-use-after-free of task->directive()->PrintCompilationOption In-Reply-To: <00d2ViOx7EtrXWue7zG_ucZ5570efYP8ItX1c2kAUXI=.2eaa1414-a62a-428b-a852-3a3d2055afac@github.com> References: <00d2ViOx7EtrXWue7zG_ucZ5570efYP8ItX1c2kAUXI=.2eaa1414-a62a-428b-a852-3a3d2055afac@github.com> Message-ID: On Sun, 16 Mar 2025 09:01:42 GMT, SendaoYan wrote: > Hi all, > > After [JDK-8351938](https://bugs.openjdk.org/browse/JDK-8351938) several tests intermittent fails, because JVM print additional message "COMPILE SKIPPED: concurrent class loading", and AddressSanitizer report 'heap-use-after-free' error. > > I don't know why `task->directive()->PrintCompilationOption` set as `true` automatically. This PR remove `task->directive()->PrintCompilationOption` usage, and set PrintCompilation to true when receive CompileCommandEnum::PrintCompilation. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24073 From chagedorn at openjdk.org Mon Mar 17 07:11:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 17 Mar 2025 07:11:53 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v2] In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Fri, 14 Mar 2025 10:32:01 GMT, Christian Hagedorn wrote: >> This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. >> >> ### Unified Logic to Clean Up Parse and Template Assertion Predicates >> We now use the following algorithm: >> https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 >> >> This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: >> >> https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 >> >> We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. >> >> #### Ignoring Useless Parse Predicates >> While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. >> >> #### Sharing Code for Parse and Template Assertion Predicates >> With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: >> >> - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. >> - Adding accessor methods to get the Predicate lists from `Compile`. >> - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates >> >> Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. >> >> ### Other Included Changes >> - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could ... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Introduce predicates_enums.hpp Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24013#issuecomment-2728406983 From epeter at openjdk.org Mon Mar 17 07:21:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Mar 2025 07:21:01 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v3] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 10:04:50 GMT, Xiaohong Gong wrote: >> The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. >> >> Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. >> >> This patch added the rearrange support for vector types with small lane count. Here are the main changes: >> - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) >> - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation >> - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one >> >> Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. >> >> 1) NEON >> >> JMH on panama-vector:vectorIntrinsics: >> >> Benchmark (size) Mode Cnt Units Before After Gain >> Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x >> Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x >> Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x >> Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x >> Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x >> Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x >> Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x >> Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x >> Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x >> Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x >> Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x >> Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'jdk:master' into JDK-8350463 > - Add the IR test > - 8350463: AArch64: Add vector rearrange support for small lane count vectors > > The AArch64 vector rearrange implementation currently lacks support for > vector types with lane counts < 4 (see [1]). This limitation results in > significant performance gaps when running Long/Double vector benchmarks > on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to > other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used > byte array as payload previously. The minimum vector lane count of 4 for > byte type on AArch64 imposed this limitation on rearrange operations. > However, vector shuffle payload has been updated to use vector-specific > data types (e.g., `int` for `IntVector`) (see [2]). This change enables > us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane > count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller > lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file > for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for > short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH > benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the > same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > ``` > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.slic... Thanks for adding the IR test! Actually I just checked, and it seems we don't really have many tests for `rearrange`, and that's not great. The only test I could find was: `test/hotspot/jtreg/compiler/vectorapi/TestTwoVectorPermute.java` But the coverage here is not sufficient at all. I think it only covers 256 bit vectors. Your test now covers 64 bit and 128 bit cases, but not for all types. I guess you are only targetting small types, so I don't want to burden you with writing tests for all sizes now. But we should definitely file an RFE that tests `rearrange` more thoroughly. Maybe I just didn't find the tests, so please check again yourself first ;) I have not reviewed the actual aarch64 code, I'll leave that to experts in that field ;) test/hotspot/jtreg/compiler/vectorapi/VectorRearrangeTest.java line 28: > 26: * @bug 8350463 > 27: * @summary AArch64: Add vector rearrange support for small lane count vectors > 28: * @requires (os.simpleArch == "x64" & vm.cpu.features ~= ".*avx.*") | os.arch=="aarch64" Can you please remove the requirement here, and instead move restrictions to just the IR rules? That way the test can run on all platforms, but the IR verification is only executed on the specified platforms. test/hotspot/jtreg/compiler/vectorapi/VectorRearrangeTest.java line 94: > 92: fsrc[i] = random.nextFloat(); > 93: dsrc[i] = random.nextDouble(); > 94: } Could you please use `Generators.java`? This makes sure that we have more "interesting" values in the distribution. test/hotspot/jtreg/compiler/vectorapi/VectorRearrangeTest.java line 115: > 113: .intoArray(bdst, i); > 114: } > 115: } Is there any verification here that the result is correct? I usually do that with a `GOLD` value computed once at the beginning (in interpreter mode because C2 compilation only happens later), and then a `@Check` method where I compare the results. test/hotspot/jtreg/compiler/vectorapi/VectorRearrangeTest.java line 219: > 217: TestFramework testFramework = new TestFramework(); > 218: testFramework.setDefaultWarmup(10000) > 219: .addFlags("--add-modules=jdk.incubator.vector", "-XX:-TieredCompilation") Suggestion: .addFlags("--add-modules=jdk.incubator.vector") I don't think that `TieredCompilation` should be necessary here. Or was it? ------------- PR Review: https://git.openjdk.org/jdk/pull/23790#pullrequestreview-2689316798 PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1998060161 PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1998061285 PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1998061844 PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1998062644 From xgong at openjdk.org Mon Mar 17 07:29:58 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 17 Mar 2025 07:29:58 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v3] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 07:18:22 GMT, Emanuel Peter wrote: > Thanks for adding the IR test! > > Actually I just checked, and it seems we don't really have many tests for `rearrange`, and that's not great. The only test I could find was: `test/hotspot/jtreg/compiler/vectorapi/TestTwoVectorPermute.java` But the coverage here is not sufficient at all. I think it only covers 256 bit vectors. > > Your test now covers 64 bit and 128 bit cases, but not for all types. I guess you are only targetting small types, so I don't want to burden you with writing tests for all sizes now. But we should definitely file an RFE that tests `rearrange` more thoroughly. > > Maybe I just didn't find the tests, so please check again yourself first ;) > > I have not reviewed the actual aarch64 code, I'll leave that to experts in that field ;) Thanks for looking at this PR again @eme64 ! Vector API has its own jtreg tests under `test/jdk/jdk/incubator/vector/`. I double checked that it has the `rearrange` test for all vector species. Please see one of the test here: https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Long128VectorTests.java#L4954 That's also way I did not add the correct tests in the IR test file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2728437681 From epeter at openjdk.org Mon Mar 17 07:38:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Mar 2025 07:38:53 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 09:28:58 GMT, Emanuel Peter wrote: >> The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. >> >> Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. >> >> This patch added the rearrange support for vector types with small lane count. Here are the main changes: >> - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) >> - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation >> - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one >> >> Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. >> >> 1) NEON >> >> JMH on panama-vector:vectorIntrinsics: >> >> Benchmark (size) Mode Cnt Units Before After Gain >> Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x >> Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x >> Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x >> Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x >> Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x >> Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x >> Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x >> Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x >> Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x >> Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x >> Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x >> Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.... > > But the testing on my side so far looks good. I'll rerun once you add your IR tests. > Thanks for looking at this PR again @eme64 ! Vector API has its own jtreg tests under `test/jdk/jdk/incubator/vector/`. I double checked that it has the `rearrange` test for all vector species. Please see one of the test here: https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Long128VectorTests.java#L4954 That's also way I did not add the correct tests in the IR test file. Alright. I think result verification would still be good practice, and not too difficult to do using a `@Check` method and `Verify.java` for comparing the resulting arrays. But I leave that up to you. In my experience, the VectorAPI test coverage is not as good as I first thought, see the list of bugs I recently found: https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC So adding a little more rigor to your IR test could catch possible bugs that the existing tests simply do not cover. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2728450332 From xgong at openjdk.org Mon Mar 17 07:38:53 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 17 Mar 2025 07:38:53 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: <7nuoTZqQ0-26ALKNF2gji3B0jkHElRrSbAdf0vxVbmk=.b39f8266-2f05-4796-9431-bf93e5bbbe11@github.com> On Mon, 17 Mar 2025 07:33:16 GMT, Emanuel Peter wrote: > > Thanks for looking at this PR again @eme64 ! Vector API has its own jtreg tests under `test/jdk/jdk/incubator/vector/`. I double checked that it has the `rearrange` test for all vector species. Please see one of the test here: https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Long128VectorTests.java#L4954 That's also way I did not add the correct tests in the IR test file. > > Alright. I think result verification would still be good practice, and not too difficult to do using a `@Check` method and `Verify.java` for comparing the resulting arrays. But I leave that up to you. In my experience, the VectorAPI test coverage is not as good as I first thought, see the list of bugs I recently found: https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC > > So adding a little more rigor to your IR test could catch possible bugs that the existing tests simply do not cover. OK. Sounds good to me. I will add the correctness check to have a double check. Thanks for your suggestion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2728456101 From xgong at openjdk.org Mon Mar 17 07:38:57 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 17 Mar 2025 07:38:57 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v3] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 07:01:58 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Merge branch 'jdk:master' into JDK-8350463 >> - Add the IR test >> - 8350463: AArch64: Add vector rearrange support for small lane count vectors >> >> The AArch64 vector rearrange implementation currently lacks support for >> vector types with lane counts < 4 (see [1]). This limitation results in >> significant performance gaps when running Long/Double vector benchmarks >> on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to >> other SVE and x86 platforms. >> >> Vector rearrange operations depend on vector shuffle inputs, which used >> byte array as payload previously. The minimum vector lane count of 4 for >> byte type on AArch64 imposed this limitation on rearrange operations. >> However, vector shuffle payload has been updated to use vector-specific >> data types (e.g., `int` for `IntVector`) (see [2]). This change enables >> us to remove the lane count restriction for vector rearrange operations. >> >> This patch added the rearrange support for vector types with small lane >> count. Here are the main changes: >> - Added AArch64 match rule support for `VectorRearrange` with smaller >> lane counts (e.g., `2D/2S`) >> - Relocated NEON implementation from ad file to c2 macro assembler file >> for better handling of complex implementation >> - Optimized temporary register usage in NEON implementation for >> short/int/float types from two registers to one >> >> Following is the performance improvement data of several Vector API JMH >> benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the >> same JMH with other vector types remains unchanged. >> >> 1) NEON >> >> JMH on panama-vector:vectorIntrinsics: >> ``` >> Benchmark (size) Mode Cnt Units Before After Gain >> Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x >> Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x >> Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x >> Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x >> Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x >> Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x >> Int64Vecto... > > test/hotspot/jtreg/compiler/vectorapi/VectorRearrangeTest.java line 94: > >> 92: fsrc[i] = random.nextFloat(); >> 93: dsrc[i] = random.nextDouble(); >> 94: } > > Could you please use `Generators.java`? This makes sure that we have more "interesting" values in the distribution. Thanks for your suggestion! I will take a look at this interface. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r1998101209 From chagedorn at openjdk.org Mon Mar 17 08:04:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 17 Mar 2025 08:04:01 GMT Subject: RFR: 8352110: [BACKOUT] C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: <07tx0lRR7SJ7tCx-jZWVTSAEh7ZhL9I8Ed6n7_4vnCc=.ac295a8b-ab87-43af-a4df-cae2ffa2eba9@github.com> On Mon, 17 Mar 2025 06:47:08 GMT, Christian Hagedorn wrote: > Due to some failures, I think it's best to backout this and address the problems properly in a REDO. > > Thanks, > Christian Thanks for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24074#issuecomment-2728502559 From chagedorn at openjdk.org Mon Mar 17 08:04:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 17 Mar 2025 08:04:01 GMT Subject: Integrated: 8352110: [BACKOUT] C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: References: Message-ID: <_itYvLXrVB1aRbKHaIuHSd1VI7C926edTPBmfFip0C0=.47f9276d-54f2-47f3-bbb1-70f17b6f6697@github.com> On Mon, 17 Mar 2025 06:47:08 GMT, Christian Hagedorn wrote: > Due to some failures, I think it's best to backout this and address the problems properly in a REDO. > > Thanks, > Christian This pull request has now been integrated. Changeset: e29d4055 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/e29d405504560eee46b4d98b90476deb45c32668 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8352110: [BACKOUT] C2: Print compilation bailouts with PrintCompilation compile command Reviewed-by: thartmann, syan ------------- PR: https://git.openjdk.org/jdk/pull/24074 From rcastanedalo at openjdk.org Mon Mar 17 09:46:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 17 Mar 2025 09:46:52 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> Message-ID: <4970lfg_SYiaN_khYDdnxKE2_6UgCw0n0T8uG-LZ2n0=.618bffee-9db9-483a-ba13-55992d7d9374@github.com> On Fri, 14 Mar 2025 15:06:33 GMT, Roberto Casta?eda Lozano wrote: > The alternative of using `memory_type()` and introducing a `StoreS` node assumes for correctness that the array fill optimization does not succeed for mismatched stores such as those you mention (e.g. `StoreS` into a `char[]`). After some more thought, I lean towards just disabling the `OptimizeFill` optimization for mismatched stores. It does not succeed today anyway due to accidental reasons (brittleness in pattern matching), so disabling it for this case should not have any other impact than making us more confident in the correctness of the optimization. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2728817048 From adinn at openjdk.org Mon Mar 17 10:10:03 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 17 Mar 2025 10:10:03 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value In-Reply-To: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: On Fri, 14 Mar 2025 09:43:15 GMT, Xiaohong Gong wrote: > The following assertion fails on AArch64: > > > Internal Error (jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2991), pid=3822987, tid=3823007 > assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value > > > with a simple Vector API case: > > public static IntVector test() { > IntVector iv = IntVector.zero(IntVector.SPECIES_128); > return iv.lanewise(VectorOperators.ROR, iv); > } > > > On AArch64, vector `ROR/ROL` (rotate right/left) operations are implemented with a combination of shifts. Please see the pattern for `ROR`: > > > lsr dst1, src, cnt // unsigned right shift > lsl dst2, src, bitSize - cnt // left shift > orr dst, dst1, dst2 // logical or > > where `bitSize` is the element type width (e.g. `32` for `int`). In above case, `cnt` is a zero constant, resulting in a left shift of 32 (`bitSize - 0`), which exceeds the instruction's valid shift count range and triggers the assertion. To fix this, we need to mask the shift count to ensure it stays within valid range when calculating shift counts for rotate operations: `shiftCnt = shiftCnt & (bitSize - 1)`. > > Note that the mask is only necessary for constant shift counts. This not only fixes the assertion failure, but also allows `ROR/ROL src, 0` to be optimized to `src` directly. > > For vector variables as shift counts, the masking can be safely omitted because: > 1. Vector shift instructions that take a vector register as the shift count may not automatically apply modulo arithmetic based on register size. When the shift count is `32` for int type, the result may be either `zeros` or `src`. However, this doesn't affect correctness for rotate since the final result is combined with `src` using a logical `OR` operation. > 2. It saves a vector logical `AND` for masking, which is friendly to the performance. Looks good. test/hotspot/jtreg/compiler/vectorapi/TestRotateWithZero.java line 80: > 78: > 79: private static void rotateLeftWithZero() { > 80: IntVector vzero = IntVector.zero(I_SPECIES); This local is unused. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24051#pullrequestreview-2689851145 PR Review Comment: https://git.openjdk.org/jdk/pull/24051#discussion_r1998382139 From mli at openjdk.org Mon Mar 17 11:49:56 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Mar 2025 11:49:56 GMT Subject: RFR: 8351876: RISC-V: enable and fix some float round tests In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:21:33 GMT, Fei Yang wrote: > Looks fine to me modulo one minor comment. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24015#issuecomment-2729188484 From mli at openjdk.org Mon Mar 17 11:49:57 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Mar 2025 11:49:57 GMT Subject: Integrated: 8351876: RISC-V: enable and fix some float round tests In-Reply-To: References: Message-ID: <-fHHzdfSwHVNFuJNmlMfpchB1Rao-ZIUHbZW87i9ME8=.23946037-cd6c-4f3d-9b52-89946ed55a34@github.com> On Wed, 12 Mar 2025 17:01:14 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > It's a follow-up of https://github.com/openjdk/jdk/pull/23985. > > Thanks This pull request has now been integrated. Changeset: dbf47d6c Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/dbf47d6c6c9573a143e0158a0664dd3bbab8e251 Stats: 13 lines in 2 files changed: 8 ins; 0 del; 5 mod 8351876: RISC-V: enable and fix some float round tests Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/24015 From eastigeevich at openjdk.org Mon Mar 17 12:42:55 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 17 Mar 2025 12:42:55 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 21:57:21 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issues src/hotspot/share/code/nmethod.cpp line 1464: > 1462: if (nm->lookup_code_blob_type() == code_blob_type) { > 1463: return nm; > 1464: } No need for this check. We can relocate nmethods with the same code heap. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1998645541 From eastigeevich at openjdk.org Mon Mar 17 12:51:55 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 17 Mar 2025 12:51:55 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 21:57:21 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issues test/hotspot/jtreg/compiler/whitebox/RelocateNMethodMultiplePaths.java line 22: > 20: * or visit www.oracle.com if you need additional information or have any > 21: * questions. > 22: */ Correct copyright: /* * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 only, as * published by the Free Software Foundation. * * This code is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * version 2 for more details (a copy is included in the LICENSE file that * accompanied this code). * * You should have received a copy of the GNU General Public License version * 2 along with this work; if not, write to the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. * * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA * or visit www.oracle.com if you need additional information or have any * questions. * */ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1998661727 From eastigeevich at openjdk.org Mon Mar 17 12:56:10 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 17 Mar 2025 12:56:10 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 21:57:21 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issues test/lib/jdk/test/whitebox/WhiteBox.java line 494: > 492: relocateNMethodTo0(method, type); > 493: } > 494: public native void relocateAllNMethods(); Please remove `relocateAllNMethods`. We don't need it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1998670645 From eastigeevich at openjdk.org Mon Mar 17 13:10:02 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 17 Mar 2025 13:10:02 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 21:44:00 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build issues > > test/hotspot/jtreg/compiler/whitebox/RelocateAllNMethods.java line 98: > >> 96: import jdk.test.whitebox.WhiteBox; >> 97: >> 98: public class RelocateAllNMethods { > > Relocate all without destination looks confusing. You need to specify where to relocate nmethods. Also there would be not many compiled code. It mostly tests relocation of code which cannot be relocated. > IMO it's better to compile a bunch of method, to relocate them, and to execute them. Please remove the test. We don't need it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1998691669 From eastigeevich at openjdk.org Mon Mar 17 13:10:01 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 17 Mar 2025 13:10:01 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 21:57:21 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issues src/hotspot/share/prims/whitebox.cpp line 1640: > 1638: WB_END > 1639: > 1640: WB_ENTRY(void, WB_RelocateAllNMethods(JNIEnv* env)) We don't need it. test/hotspot/jtreg/compiler/whitebox/DeoptimizeRelocatedNMethod.java line 2: > 1: /* > 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. Fix the copyright. test/hotspot/jtreg/compiler/whitebox/RelocateAndDeoptmizeAllNMethods.java line 98: > 96: import jdk.test.whitebox.WhiteBox; > 97: > 98: public class RelocateAndDeoptmizeAllNMethods { Please rewrite the test to the following which more practical: 1. Compile a set of Java methods with C1. There should be a call graph of the methods with one entry point which produces a result. 2. Call the entry point to produce a result. 3. Relocate nmethods. 4. Call the entry point to produce a new result. 5. Compare the results. 6. Deoptimize the nmethods. 7. Check Java methods don't have nmethods. Do the same for C2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1998694877 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1998692849 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1998689847 From eastigeevich at openjdk.org Mon Mar 17 13:12:55 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 17 Mar 2025 13:12:55 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 21:57:21 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issues src/hotspot/share/code/nmethod.cpp line 1471: > 1469: // Clear inline caches before acquiring any locks > 1470: VM_ClearNMethodICs clear_nmethod_ics(nm); > 1471: VMThread::execute(&clear_nmethod_ics); This does not look correct. Why are you doing this? What are you trying to fix? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1998700403 From adinn at openjdk.org Mon Mar 17 14:08:37 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 17 Mar 2025 14:08:37 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v2] In-Reply-To: References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> <3F6Qa5TvjBGUXvCBukcqJHG4Q4UgoqU5NmP2uMOHQAM=.cd9894a4-1d51-4b4d-9648-d5855d50d97d@github.com> <7A298xkGx3GJ2Pt6yIEK_DABrosqrbpYXH905hVUyxc=.a7555ea3-ef86-4bd5-a569-c1757ac6f1ab@github.com> Message-ID: On Thu, 13 Mar 2025 12:34:25 GMT, Ferenc Rakoczi wrote: >>> @ferakocz I have modified your generator code to employ vector sequences and auxiliaries that handle iterative loads, stores and math/logic operations over vector sequences. It would be useful to have a review of the code from you and also for you to test it (see comments below re testing) The rewrite has allowed much of the generator logic to be condensed into calls to simple auxiliaries which provides a better mid-level view of how the code is structured. It has also clarified the register use. I think this will be a lot easier for maintainers to understand. A few further comments: >>> >>> 1. I have added some asserts to the montmul operations to ensure that input and output register sequences are either disjoint or overlapping. There may be further opportunities to add asserts in a follow-up. >>> 2. One thing I noted (commented on in code) after switching to passing vector sequences rather than relying on fixed mappings is that some reloading of q and qinv inside loops is unnecessary as the code in the loop does not write the relevant vectors. I left the code as is so that I could check that the generated code is identical to the original but I will move the relevant load outside the loop before pushing. >>> 3. I compared before and after dissasemblies of the generated code and it is unchanged modulo routine `dilithiumDecomposePoly`. For that intrinsic your generator code wrote successive, intermediate results into the next unused set of 4 vectors, which are in most cases used subsequently to hold a non-temporary result needed by a later computation. My code always writes intermediate results into the last set of 4 vectors (declared as `VSeq<4> vtmp(20)`). As a result my generated code has the same structure but a slightly different register mapping to yours. I don't believe this affects performance but the change do make it clearer how the computed values are being used. >>> 4. As well as comparing disassemblies for the generated code I verified the patch by running test `jdk/sun/security/provider/acvp/ML_DSA_Test.java`. However, I noted a problem with relying on the test as currently implemented since it did not appear to capture some errors in my code. I re-ran the test under the debugger and found that only one of the intrinsics was being exercised (dilithiumAlmostNtt). I confirmed this by adding -XX:+PrintCompilation to the test command line. It seems that all the calls to other intrinsic candidates occurred from the interpreter and did not run ofte... > >> @ferakocz I see that the test problem is being addressed as part of the x86 ML_DSA PR. > > Oh, so you have already found that :-) . @ferakocz @dean-long Would you be able to review this reworking of the original ML_DSA intrinsic code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24026#issuecomment-2729631305 From adinn at openjdk.org Mon Mar 17 14:08:37 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 17 Mar 2025 14:08:37 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v5] In-Reply-To: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> Message-ID: <2VksBtd_XqgUIQpirjTmAkXUpVZPtahtmLfIoEVRC0A=.895aa101-0baa-461c-970d-b95f146a4f9a@github.com> > This PR reworks the existing AArch64 ML_DSA intrinsic code generator to make it clearer to read and easier to maintain. Andrew Dinn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - use references and const to avoid VSeq copying and fix int array arg issue - fix comment - fix invalid register argument - fix errors in comments - fix whitespace errors - Clearer implementation of AArch64 dilithium generator ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24026/files - new: https://git.openjdk.org/jdk/pull/24026/files/9ee9eecc..b46226d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24026&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24026&range=03-04 Stats: 50306 lines in 940 files changed: 25144 ins; 15820 del; 9342 mod Patch: https://git.openjdk.org/jdk/pull/24026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24026/head:pull/24026 PR: https://git.openjdk.org/jdk/pull/24026 From duke at openjdk.org Mon Mar 17 14:32:09 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 17 Mar 2025 14:32:09 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v3] In-Reply-To: References: Message-ID: <_9WlWTtn2xrN1CSeJjc7jI3tuhnoTYlwqDvxK7-e0w0=.4ebc1862-8863-48aa-8374-24b8e36db012@github.com> > WIP. > > It worked for cases in the TestMergeLoads.java and can observe performance improvement in MergeLoadBench.getIntB . Need to check more cases. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Add tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/7a1c524d..fb6cd3d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=01-02 Stats: 304 lines in 3 files changed: 293 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From rraj at openjdk.org Mon Mar 17 15:28:12 2025 From: rraj at openjdk.org (Rohit Arul Raj) Date: Mon, 17 Mar 2025 15:28:12 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: > This change enables optimized SIMD sort for AMD Zen 4 (AVX2) & Zen 5 (AVX512). > > JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. > > Attaching ArraySort performance data for Zen4 & Zen5. > [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) > [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: create a separate method to check for cpu's supporting avx512 version of simd sort ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24053/files - new: https://git.openjdk.org/jdk/pull/24053/files/8de2e2c2..42011911 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24053&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24053&range=00-01 Stats: 14 lines in 3 files changed: 6 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24053/head:pull/24053 PR: https://git.openjdk.org/jdk/pull/24053 From epeter at openjdk.org Mon Mar 17 15:41:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Mar 2025 15:41:46 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable Message-ID: With the Template Framework, I'm generating IR tests randomly. But random code can always hit bailouts in compilation, and make code not compilable any more. We should have a way to disable this check, and just gracefully continue to execute the tests. To allow a single test method to be `not compilable`: https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L160-L161 To allow all test methods to be `not compilable`: https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L140-L144 See also this documentation in the code: https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/compiler/lib/ir_framework/Test.java#L88-L94 ------------- Commit messages: - restrict test - improve test with @Run - typo - copyright years - fix test - more documentation - minor fixes - refactor with NotCompilableIRMethod - more documentation - fix tests, and impl with MethodNotCompilableException - ... and 16 more: https://git.openjdk.org/jdk/compare/82eb7806...ce40f140 Changes: https://git.openjdk.org/jdk/pull/24049/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24049&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351952 Stats: 433 lines in 16 files changed: 408 ins; 0 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/24049.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24049/head:pull/24049 PR: https://git.openjdk.org/jdk/pull/24049 From psandoz at openjdk.org Mon Mar 17 15:50:52 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 17 Mar 2025 15:50:52 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 15:28:12 GMT, Rohit Arul Raj wrote: >> This change enables optimized SIMD sort for AMD Zen 4 (AVX2) & Zen 5 (AVX512). >> >> JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. >> >> Attaching ArraySort performance data for Zen4 & Zen5. >> [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) >> [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) > > Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: > > create a separate method to check for cpu's supporting avx512 version of simd sort Looks good, thank you for updating. I am not a proper HotSpot reviewer so i bumped up the number of required reviewers, and a HotSpot developer needs to quickly review it. ------------- Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24053#pullrequestreview-2691098623 From epeter at openjdk.org Mon Mar 17 16:03:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Mar 2025 16:03:11 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI Message-ID: During work on [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942) I discovered that it is currently not possible to compile VectorAPI code because it is still in incubator mode and needs flag "--add-modules=jdk.incubator.vector" for "javac". Also: "javac" can produce warnings, and that leads to issues like this: [JDK-8351998](https://bugs.openjdk.org/browse/JDK-8351998). We should allow such warnings, they are not compile failures. Example: javac --add-modules=jdk.incubator.vector Test.java warning: [incubating] using incubating module(s): jdk.incubator.vector 1 warning I added an example test as well. ------------- Commit messages: - whitespace - JDK-8352020 Changes: https://git.openjdk.org/jdk/pull/24082/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24082&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352020 Stats: 105 lines in 3 files changed: 99 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24082/head:pull/24082 PR: https://git.openjdk.org/jdk/pull/24082 From epeter at openjdk.org Mon Mar 17 16:29:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Mar 2025 16:29:41 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v3] In-Reply-To: References: Message-ID: <7j6ND6FpdMgH5NNETgnSujwqgQWIZTyV5gF3mphT_2I=.0922247b-51d6-4e60-b2eb-10fff8451f01@github.com> On Sat, 15 Mar 2025 00:56:25 GMT, Sandhya Viswanathan wrote: >> Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. >> >> Review comments are welcome. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Review comments from Emanuel Tests are passing. Approved. @sviswa7 Thanks for the work ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23939#pullrequestreview-2691176744 From sviswanathan at openjdk.org Mon Mar 17 17:02:08 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 17 Mar 2025 17:02:08 GMT Subject: RFR: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input [v3] In-Reply-To: References: <1TIUz8O4xjCwMArQYWlPJ4qRR9SBVpux0cceH9m2X5k=.521532a4-9ea7-4031-aa98-a60ce2c8982a@github.com> Message-ID: On Mon, 17 Mar 2025 06:58:39 GMT, Emanuel Peter wrote: >> @eme64 @jatin-bhateja Your review comments are handled. Please take a look. > > @sviswa7 The code and test looks good to me now. I'm re-running testing. Please ping me in a day for the results :) Thanks a lot @eme64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23939#issuecomment-2730244888 From sparasa at openjdk.org Mon Mar 17 17:03:09 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 17 Mar 2025 17:03:09 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 15:28:12 GMT, Rohit Arul Raj wrote: >> This change enables optimized SIMD sort for AMD Zen 4 (AVX2) & Zen 5 (AVX512). >> >> JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. >> >> Attaching ArraySort performance data for Zen4 & Zen5. >> [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) >> [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) > > Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: > > create a separate method to check for cpu's supporting avx512 version of simd sort src/hotspot/cpu/x86/vm_version_x86.hpp line 777: > 775: > 776: static bool supports_avx512_simd_sort() { > 777: // Disable AVX512 version of SIMD Sort on AMD Zen4 Processors Looks like you forgot to remove the comment: `// Disable AVX512 version of SIMD Sort on AMD Zen4 Processors` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r1999244927 From sparasa at openjdk.org Mon Mar 17 17:06:11 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 17 Mar 2025 17:06:11 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 15:28:12 GMT, Rohit Arul Raj wrote: >> This change enables optimized SIMD sort for AMD Zen 4 (AVX2) & Zen 5 (AVX512). >> >> JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. >> >> Attaching ArraySort performance data for Zen4 & Zen5. >> [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) >> [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) > > Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: > > create a separate method to check for cpu's supporting avx512 version of simd sort src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4318: > 4316: // Load x86_64_sort library on supported hardware to enable SIMD sort and partition intrinsics > 4317: > 4318: if (VM_Version::supports_avx512dq() || VM_Version::supports_avx2()) { Shouldn't you check for `VM_Version::supports_avx512_simd_sort()` here as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r1999252275 From kvn at openjdk.org Mon Mar 17 17:17:10 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Mar 2025 17:17:10 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 15:53:17 GMT, Emanuel Peter wrote: > During work on [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942) I discovered that it is currently not possible to compile VectorAPI code because it is still in incubator mode and needs flag "--add-modules=jdk.incubator.vector" for "javac". > > Also: "javac" can produce warnings, and that leads to issues like this: [JDK-8351998](https://bugs.openjdk.org/browse/JDK-8351998). We should allow such warnings, they are not compile failures. > > Example: > > javac --add-modules=jdk.incubator.vector Test.java > warning: [incubating] using incubating module(s): jdk.incubator.vector > 1 warning > > > I added an example test as well. Very good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24082#pullrequestreview-2691435911 From rraj at openjdk.org Mon Mar 17 17:22:08 2025 From: rraj at openjdk.org (Rohit Arul Raj) Date: Mon, 17 Mar 2025 17:22:08 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 17:00:28 GMT, Srinivas Vamsi Parasa wrote: >> Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: >> >> create a separate method to check for cpu's supporting avx512 version of simd sort > > src/hotspot/cpu/x86/vm_version_x86.hpp line 777: > >> 775: >> 776: static bool supports_avx512_simd_sort() { >> 777: // Disable AVX512 version of SIMD Sort on AMD Zen4 Processors > > Looks like you forgot to remove the comment: `// Disable AVX512 version of SIMD Sort on AMD Zen4 Processors` For Zen4, we are disabling AVX512 version of SIMD Sort and using AVX2 version. So the comment is valid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r1999275419 From sparasa at openjdk.org Mon Mar 17 17:22:08 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 17 Mar 2025 17:22:08 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 17:17:29 GMT, Rohit Arul Raj wrote: >> src/hotspot/cpu/x86/vm_version_x86.hpp line 777: >> >>> 775: >>> 776: static bool supports_avx512_simd_sort() { >>> 777: // Disable AVX512 version of SIMD Sort on AMD Zen4 Processors >> >> Looks like you forgot to remove the comment: `// Disable AVX512 version of SIMD Sort on AMD Zen4 Processors` > > For Zen4, we are disabling AVX512 version of SIMD Sort and using AVX2 version. So the comment is valid. Got it! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r1999281629 From rraj at openjdk.org Mon Mar 17 17:29:08 2025 From: rraj at openjdk.org (Rohit Arul Raj) Date: Mon, 17 Mar 2025 17:29:08 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 17:03:41 GMT, Srinivas Vamsi Parasa wrote: >> Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: >> >> create a separate method to check for cpu's supporting avx512 version of simd sort > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4318: > >> 4316: // Load x86_64_sort library on supported hardware to enable SIMD sort and partition intrinsics >> 4317: >> 4318: if (VM_Version::supports_avx512dq() || VM_Version::supports_avx2()) { > > Shouldn't you check for `VM_Version::supports_avx512_simd_sort()` here as well? The above condition will hold for all AMD processors. Only for Zen4, even though AVX512 is supported, we want to pick AVX2 version of SIMD sort (due to the regression) which is handled by the code below: snprintf(ebuf_, sizeof(ebuf_), **VM_Version::supports_avx512_simd_sort()** ? "avx512_sort" : "avx2_sort"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r1999288544 From sparasa at openjdk.org Mon Mar 17 17:29:08 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 17 Mar 2025 17:29:08 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 17:23:26 GMT, Rohit Arul Raj wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4318: >> >>> 4316: // Load x86_64_sort library on supported hardware to enable SIMD sort and partition intrinsics >>> 4317: >>> 4318: if (VM_Version::supports_avx512dq() || VM_Version::supports_avx2()) { >> >> Shouldn't you check for `VM_Version::supports_avx512_simd_sort()` here as well? > > The above condition will hold for all AMD processors. Only for Zen4, even though AVX512 is supported, we want to pick AVX2 version of SIMD sort (due to the regression) which is handled by the code below: > > snprintf(ebuf_, sizeof(ebuf_), **VM_Version::supports_avx512_simd_sort()** ? "avx512_sort" : "avx2_sort"); Thanks for the clarification! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r1999292833 From sparasa at openjdk.org Mon Mar 17 17:33:07 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 17 Mar 2025 17:33:07 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 17:26:04 GMT, Srinivas Vamsi Parasa wrote: >> The above condition will hold for all AMD processors. Only for Zen4, even though AVX512 is supported, we want to pick AVX2 version of SIMD sort (due to the regression) which is handled by the code below: >> >> snprintf(ebuf_, sizeof(ebuf_), **VM_Version::supports_avx512_simd_sort()** ? "avx512_sort" : "avx2_sort"); > > Thanks for the clarification! Also, please update the PR description summarizing the main high-level changes in this PR. Will make it easy for others. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r1999302156 From rraj at openjdk.org Mon Mar 17 17:48:15 2025 From: rraj at openjdk.org (Rohit Arul Raj) Date: Mon, 17 Mar 2025 17:48:15 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 17:30:54 GMT, Srinivas Vamsi Parasa wrote: >> Thanks for the clarification! > > Also, please update the PR description summarizing the main high-level changes in this PR. Will make it easy for others. Thanks Vamsi, updated the PR description accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r1999325651 From sviswanathan at openjdk.org Mon Mar 17 17:53:15 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 17 Mar 2025 17:53:15 GMT Subject: Integrated: 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 01:56:49 GMT, Sandhya Viswanathan wrote: > Float.float16ToFloat generates wrong vectorized code in product build and asserts in fastdebug/debug when argument is of type byte, int, or long array. The short term solution is to not auto vectorize in these cases. > > Review comments are welcome. > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: 3239919a Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/3239919a5a5910922ea4cb6109f94a24c5f6b4f2 Stats: 184 lines in 2 files changed: 183 ins; 0 del; 1 mod 8350835: C2 SuperWord: assert/wrong result when using Float.float16ToFloat with byte instead of short input Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/23939 From never at openjdk.org Mon Mar 17 18:08:08 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 17 Mar 2025 18:08:08 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> Message-ID: <0pScaTfNpM4LYdukKYZIdk1BluQVvUM2BdssI4bjisI=.d92d627f-ce02-406c-bed3-38d31692c2ef@github.com> On Sun, 16 Mar 2025 19:18:31 GMT, Vladimir Kozlov wrote: >> This code seems to be really old, from https://bugs.openjdk.org/browse/JDK-6743900. Tagging reviewers @tkrodriguez and @vnkozlov . To me, the formula for `to_pct` looks wrong. I would expect `b->_freq` and `target->_freq `to be multiplied together, not divided. > > Block::_freq is number of times this block is executed per each call of the method. It could be big number for blocks in loop and very small on not frequent path. > > `succ_prob()` is calculated based on frequencies of two block and/or corresponding branch probability: [gcm.cpp#L2100](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L2100) > > `freq = b->_freq * b->succ_prob(j)` is number of times we take this outgoing path. So to calculate probability of taking this path in `target` block we divide `freq` on number of times `target` block is executed. > > This assumes that `b->_freq` <= `target->freq`. Which seems not true in this case and indicate a bug in how we calculate and update blocks frequencies. I agree with Vladimir that it seems like something is wrong with the block probabilities. In product it would be fine to simply clamp these values in the range of 0..100 since they are just used to compute `CFGEdge::_infrequent` so the worst thing you get is a less good layout. Refactoring the expressions so it's more clear what the requirements wouldn't hurt either. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r1999355343 From duke at openjdk.org Mon Mar 17 19:12:14 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 17 Mar 2025 19:12:14 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: <8BMaz7Ssnwk6ywuPYfgaL1VSN7XZYBzpvMWRsG12-Eg=.441d03b6-4a0b-479b-b789-d9bfd03bd346@github.com> On Sat, 15 Mar 2025 00:24:20 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/code/nmethod.cpp line 1399: >> >>> 1397: { >>> 1398: debug_only(NoSafepointVerifier nsv;) >>> 1399: assert_locked_or_safepoint(CodeCache_lock); >> >> Is this lock enough to prevent GC scan it before you finish initializing it? > > My question is related to `_state` field value. During usual nmethod creation the `_state` is `not_installed`. > nmethod you are coping has `in_use` state. Someone may see this state before all fields are set. > That is why I am asking if `CodeCache_lock` prevents any other VM's threads see it. I believe this should behave the same as creating any other nmethod. `CodeCache_lock` is the only thing the other constructors use when initializing nmethods and if this was an issue they could also encounter the same race between initialization and setting the field. Looking through the GC code also shows they do hold `CodeCache_lock` before scans. [G1](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/gc/g1/g1HeapRegionRemSet.cpp#L112), [Shenandoah](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/gc/shenandoah/shenandoahCodeRoots.cpp#L195), [ZGC](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/gc/z/zNMethodTable.cpp#L215) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999451009 From kxu at openjdk.org Mon Mar 17 19:18:25 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 17 Mar 2025 19:18:25 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: <2nVVIoevlBwt1i_TjZjsR_MVZuCKoazxUYwXLmVLdH8=.9705be1c-5f6e-4399-9747-0bf1dc09e180@github.com> > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: implement @eme64's suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/851bfb2f..f71a1dc0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=06-07 Stats: 107 lines in 3 files changed: 11 ins; 43 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From kxu at openjdk.org Mon Mar 17 19:18:25 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 17 Mar 2025 19:18:25 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 17:21:06 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/addnode.cpp line 480: >> >>> 478: if (!con->is_Con()) { >>> 479: swap(con, base); >>> 480: } >> >> Is that necessary? Does `Mul` not automatically get canonicalized so that the constant is on the rhs? > > This is not related to `Mul` canonicalization. Swapping those two variables makes my next 4 lines syntactically easier to write. (So I don't have to do `(con->is_Con ? con : base)->is_top()` and so on.) This shouldn't be any more costly with a modern C++ compiler. Obsolete after refactored to use `find_simple_lshift_pattern`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1999455042 From kxu at openjdk.org Mon Mar 17 19:18:25 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 17 Mar 2025 19:18:25 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: <8G-CIxC5jewxt6jG5LaR36bOLRSBPI1jn7fbqIByPhM=.fb06fe6e-96c5-412f-8236-3fce1cf32aeb@github.com> On Thu, 13 Mar 2025 08:14:04 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> add micro benchmark > > This looks really interesting! > > I see that you are doing some special pattern matching. I wonder if it might be worth generalizing the algorithm, to search through an arbitrary "tree" of additions, collect all "leaves" of (`variable * multiplier`), sort by `variable`, and compute new additions for each `variable`. What do you think? > @eme64 I wonder if it might be worth generalizing the algorithm, to search through an arbitrary "tree" of additions, collect all "leaves" of (`variable * multiplier`) [...] This was actually my first approach, a top-down approach starts from "root" and collect all leaves to convert to multiplications in one go. It didn't work mainly due to two problems: 1. Not being a technical "tree" introduces additional complexity with repeated nodes (and/or sub-"trees") It's very likely that a subtree contains repeated nodes (e.g., `Add(x, x)`). This is especially problematic if `x` itself is a complex subgraph. Deduplication (probably with memoization) introduces additional complexity. Luckily, even if it's not a tree, it's still strictly a DAG. A bottom-up approach converting leaves incrementally would avoid the issue of duplicated computation. 2. A very large tree results in resource exhaustion on one (i)GVN pass It's common for some simple loop to be unrolled into a very large subgraph, in which case the transforming the entire graph in one go is blocking and takes a long time to run. By dividing work into individual passes, we allow the optimization to progress and reduce memory footprint. > src/hotspot/share/opto/addnode.cpp line 447: > >> 445: >> 446: return nullptr; >> 447: } > > I'm not a great fan of "output arguments" such as the `multiplier` here. > Why not create a class/struct `Multiplication`, which has a field `valid` (instead of returning `nullptr`). And fields `variable` and `multiplier`. The fields can all be constant. > You could even have an `add` method, that adds two such `Multiplication`s together. Good suggestion. I didn't make fields `const` as implicitly deleting the constructor makes pattern matching more complex. > src/hotspot/share/opto/addnode.cpp line 500: > >> 498: // Note that one of the term of the addition could simply be `a` (i.e., a << 0). Calling this function with `multiplier` >> 499: // being null is safe. >> 500: Node* AddNode::find_power_of_two_addition_pattern(Node* n, BasicType bt, jlong* multiplier) { > > This code here looks quite complicated. Why not parse both sides of the add with a `find_simple_lshift_pattern`, and then check that they use the same variable? Good point. Updated to use `find_simple_lshift_pattern` ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2730570456 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1999453274 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1999454340 From kxu at openjdk.org Mon Mar 17 19:20:16 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 17 Mar 2025 19:20:16 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: <3YGofYXMD3fLbbCcOOtRLbHp8qLnyTcZ_lY8gOIzT-A=.39d80f33-47a8-4ef4-ab6d-700ee6eaa346@github.com> References: <3YGofYXMD3fLbbCcOOtRLbHp8qLnyTcZ_lY8gOIzT-A=.39d80f33-47a8-4ef4-ab6d-700ee6eaa346@github.com> Message-ID: On Thu, 13 Mar 2025 09:09:21 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> add micro benchmark > > test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 310: > >> 308: CON3_L = genL.next(); >> 309: CON4_L = genL.next(); >> 310: } > > Is there a reason why you are restricting the values to `powerOfTwoLongs`? I think it would be better if you just take the most general generator. > > private static final RestrictableGenerator GEN_INT = Generators.G.ints(); > private static final RestrictableGenerator GEN_LONG = Generators.G.longs(); Sorry misunderstood your previous comment. I thought you wanted to test especially power-of-two's. Now I understand you meant more "interesting" randoms (potentially close to power-of-two's) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1999461474 From kvn at openjdk.org Mon Mar 17 19:51:11 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Mar 2025 19:51:11 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: <8BMaz7Ssnwk6ywuPYfgaL1VSN7XZYBzpvMWRsG12-Eg=.441d03b6-4a0b-479b-b789-d9bfd03bd346@github.com> References: <8BMaz7Ssnwk6ywuPYfgaL1VSN7XZYBzpvMWRsG12-Eg=.441d03b6-4a0b-479b-b789-d9bfd03bd346@github.com> Message-ID: On Mon, 17 Mar 2025 19:09:58 GMT, Chad Rakoczy wrote: >> My question is related to `_state` field value. During usual nmethod creation the `_state` is `not_installed`. >> nmethod you are coping has `in_use` state. Someone may see this state before all fields are set. >> That is why I am asking if `CodeCache_lock` prevents any other VM's threads see it. > > I believe this should behave the same as creating any other nmethod. `CodeCache_lock` is the only thing the other constructors use when initializing nmethods and if this was an issue they could also encounter the same race between initialization and setting the field. > > Looking through the GC code also shows they do hold `CodeCache_lock` before scans. [G1](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/gc/g1/g1HeapRegionRemSet.cpp#L112), [Shenandoah](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/gc/shenandoah/shenandoahCodeRoots.cpp#L195), [ZGC](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/gc/z/zNMethodTable.cpp#L215) Thank you for checking it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999506966 From chagedorn at openjdk.org Mon Mar 17 20:27:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 17 Mar 2025 20:27:13 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 15:53:17 GMT, Emanuel Peter wrote: > During work on [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942) I discovered that it is currently not possible to compile VectorAPI code because it is still in incubator mode and needs flag "--add-modules=jdk.incubator.vector" for "javac". > > Also: "javac" can produce warnings, and that leads to issues like this: [JDK-8351998](https://bugs.openjdk.org/browse/JDK-8351998). We should allow such warnings, they are not compile failures. > > Example: > > javac --add-modules=jdk.incubator.vector Test.java > warning: [incubating] using incubating module(s): jdk.incubator.vector > 1 warning > > > I added an example test as well. Otherwise, it looks good to me, too. test/hotspot/jtreg/compiler/lib/compile_framework/Compile.java line 200: > 198: > 199: // Note: the output can be non-empty even if the compilation succeeds, e.g. for warnings. > 200: if (exitCode != 0) { Would `-XX:-PrintWarnings` also work? test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/IRFrameworkWithVectorAPIExample.java line 45: > 43: /** > 44: * This test shows that the IR verification can be done on code compiled by the Compile Framework. > 45: * The "@compile" command for JTREG is required so that the IRFramework is compiled, other javac What about a `CompileFilework::invokeIRTest()` or something like that, that just additionally loads the `TestFramework` class somehow (for example just creating an instance or accessing a field etc., we could also provide an empty static method in `TestFramework.java` that can be called to minimize the overhead), would that work? Then we do not need to worry about adding `@compile` or figuring out why the IR test is not working. ------------- PR Review: https://git.openjdk.org/jdk/pull/24082#pullrequestreview-2691878964 PR Review Comment: https://git.openjdk.org/jdk/pull/24082#discussion_r1999546090 PR Review Comment: https://git.openjdk.org/jdk/pull/24082#discussion_r1999551575 From duke at openjdk.org Mon Mar 17 20:36:14 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 17 Mar 2025 20:36:14 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: <6pyfiLbH52s0635HH-_C2GhjOTfa6mlLWwyVBYy9nDM=.2d4f84c3-d500-429c-8140-04ac28fd095a@github.com> On Sat, 15 Mar 2025 00:14:37 GMT, Vladimir Kozlov wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build issues > > make/hotspot/lib/CompileJvm.gmk line 201: > >> 199: DISABLED_WARNINGS_gcc_jvmtiTagMap.cpp := stringop-overflow, \ >> 200: DISABLED_WARNINGS_gcc_macroAssembler_ppc_sha.cpp := unused-const-variable, \ >> 201: DISABLED_WARNINGS_gcc_nmethod.cpp := class-memaccess, \ > > Why you need this? Without I get the following error error: ?void* memcpy(void*, const void*, size_t)? writing to an object of non-trivially copyable type ?class nmethod?; use copy-assignment or copy-initialization instead [-Werror=class-memaccess] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999603462 From duke at openjdk.org Mon Mar 17 20:44:13 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 17 Mar 2025 20:44:13 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: <6mNjgtAtJqXdjJ6dG64revjZS6EcUEyS_c_0FW2zBeo=.de409d99-20e0-4e3d-8f89-43822fd9fdb9@github.com> On Sat, 15 Mar 2025 00:15:33 GMT, Vladimir Kozlov wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build issues > > src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 93: > >> 91: void trampoline_stub_Relocation::pd_fix_owner_after_move() { >> 92: NativeCall* call = nativeCall_at(owner()); >> 93: assert(call->raw_destination() == owner(), "destination should be empty"); > > Why it was removed? It was updated and moved [here](https://github.com/openjdk/jdk/pull/23573/files#diff-d69de3a692f24763ca460ed1cba2231ea76d5d98fc7e5018032981d9968221a6R380). This check expects trampoline stubs to be unresolved which may not be the case at the time of nmethod relocation > src/hotspot/share/code/nmethod.hpp line 338: > >> 336: ); >> 337: >> 338: nmethod(nmethod& nm); > > No need this Thanks forgot to delete this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999635674 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999636248 From chagedorn at openjdk.org Mon Mar 17 20:50:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 17 Mar 2025 20:50:11 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 08:48:21 GMT, Emanuel Peter wrote: > With the Template Framework, I'm generating IR tests randomly. But random code can always hit bailouts in compilation, and make code not compilable any more. We should have a way to disable this check, and just gracefully continue to execute the tests. > > To allow a single test method to be `not compilable`: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L160-L161 > > To allow all test methods to be `not compilable`: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L140-L144 > > See also this documentation in the code: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/compiler/lib/ir_framework/Test.java#L88-L94 > > --------------------------------------- > > **Backrgound** > > My random code seems to hit a bailout in the Register Allocator, and I cannot do much to predict if that bailout happens. > See https://bugs.openjdk.org/browse/JDK-8304328 Some first comments, will continue tomorrow :-) test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 417: > 415: * test failure. However, if such cases are expected in a test class, this flag can be set to true, which > 416: * allows the all test to pass even if there is no compilation. Any associated {@link IR} rule is only executed > 417: * if the test method was compiled, and else it is ignored silently. /** * In rare cases, methods may not be compilable because of a compilation bailout. By default, this leads to a * test failure. However, if such cases are expected in multiple methods in a test class, this flag can be set to * true, which allows any test to pass even if there is a compilation bailout. If only selected methods are prone * to bail out, it is preferred to use {@link Test#allowNotCompilable()} instead for more fine-grained control. * By setting this flag, any associated {@link IR} rule of a test is only executed if the test method was compiled, * and else it is ignored silently. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irmethod/NotCompilableIRMethod.java line 34: > 32: /** > 33: * This class represents a special IR method which was not compiled by the IR framework, but this was explicitly allowed > 34: * by "allowNotCompilable". Maybe add here in what context this is used: Suggestion: * by "allowNotCompilable". This happens when the compiler bails out of a compilation (i.e. no compilation) but we treat * this as valid case. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irmethod/NotCompilableIRMethodMatchResult.java line 35: > 33: /** > 34: * This class represents a special matching result of an IR method where the compilation output was completely empty, > 35: * but this was exlicitly allowed by "allowNotCompilable". Maybe also add the same addition from above here as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/24049#pullrequestreview-2692005824 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r1999625584 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r1999640690 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r1999644590 From duke at openjdk.org Mon Mar 17 20:57:11 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 17 Mar 2025 20:57:11 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 13:10:20 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build issues > > src/hotspot/share/code/nmethod.cpp line 1471: > >> 1469: // Clear inline caches before acquiring any locks >> 1470: VM_ClearNMethodICs clear_nmethod_ics(nm); >> 1471: VMThread::execute(&clear_nmethod_ics); > > This does not look correct. > Why are you doing this? > What are you trying to fix? [Fix relocation requires](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/code/relocInfo.hpp#L858-L860) that inline caches be cleared which [must be done at a safepoint](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/code/nmethod.cpp#L770). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999658796 From kvn at openjdk.org Mon Mar 17 21:30:10 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Mar 2025 21:30:10 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: <6pyfiLbH52s0635HH-_C2GhjOTfa6mlLWwyVBYy9nDM=.2d4f84c3-d500-429c-8140-04ac28fd095a@github.com> References: <6pyfiLbH52s0635HH-_C2GhjOTfa6mlLWwyVBYy9nDM=.2d4f84c3-d500-429c-8140-04ac28fd095a@github.com> Message-ID: On Mon, 17 Mar 2025 20:33:09 GMT, Chad Rakoczy wrote: >> make/hotspot/lib/CompileJvm.gmk line 201: >> >>> 199: DISABLED_WARNINGS_gcc_jvmtiTagMap.cpp := stringop-overflow, \ >>> 200: DISABLED_WARNINGS_gcc_macroAssembler_ppc_sha.cpp := unused-const-variable, \ >>> 201: DISABLED_WARNINGS_gcc_nmethod.cpp := class-memaccess, \ >> >> Why you need this? > > Without I get the following error > > error: ?void* memcpy(void*, const void*, size_t)? writing to an object of non-trivially copyable type ?class nmethod?; use copy-assignment or copy-initialization instead [-Werror=class-memaccess] Okay, we need to ask C++ experts here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999698578 From eastigeevich at openjdk.org Mon Mar 17 21:41:11 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 17 Mar 2025 21:41:11 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 20:54:04 GMT, Chad Rakoczy wrote: >> src/hotspot/share/code/nmethod.cpp line 1471: >> >>> 1469: // Clear inline caches before acquiring any locks >>> 1470: VM_ClearNMethodICs clear_nmethod_ics(nm); >>> 1471: VMThread::execute(&clear_nmethod_ics); >> >> This does not look correct. >> Why are you doing this? >> What are you trying to fix? > > [Fix relocation requires](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/code/relocInfo.hpp#L858-L860) that inline caches be cleared which [must be done at a safepoint](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/code/nmethod.cpp#L770). The comment to `fix_relocation_after_move` is 17 year old. Is it still valid? We make a copy of nmethod. The copy is not in use. We know call sites might have invalid inline caches. Could we go through the calls sites and clear them? Requesting a vm operation for each nmethod will be very expensive. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999711219 From eastigeevich at openjdk.org Mon Mar 17 21:52:09 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 17 Mar 2025 21:52:09 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: <6pyfiLbH52s0635HH-_C2GhjOTfa6mlLWwyVBYy9nDM=.2d4f84c3-d500-429c-8140-04ac28fd095a@github.com> Message-ID: On Mon, 17 Mar 2025 21:27:21 GMT, Vladimir Kozlov wrote: >> Without I get the following error >> >> error: ?void* memcpy(void*, const void*, size_t)? writing to an object of non-trivially copyable type ?class nmethod?; use copy-assignment or copy-initialization instead [-Werror=class-memaccess] > > Okay, we need to ask C++ experts here. You should not disable this warning. It says `nmethed` class is not trivially copyable type which means `memcpy` cannot used for it. Our options to fix: - Make `nmethod` trivially copyable type. - Get back to use the operator new and the contructor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999718015 From eastigeevich at openjdk.org Mon Mar 17 21:52:10 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 17 Mar 2025 21:52:10 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: <6pyfiLbH52s0635HH-_C2GhjOTfa6mlLWwyVBYy9nDM=.2d4f84c3-d500-429c-8140-04ac28fd095a@github.com> References: <6pyfiLbH52s0635HH-_C2GhjOTfa6mlLWwyVBYy9nDM=.2d4f84c3-d500-429c-8140-04ac28fd095a@github.com> Message-ID: On Mon, 17 Mar 2025 20:33:09 GMT, Chad Rakoczy wrote: >> make/hotspot/lib/CompileJvm.gmk line 201: >> >>> 199: DISABLED_WARNINGS_gcc_jvmtiTagMap.cpp := stringop-overflow, \ >>> 200: DISABLED_WARNINGS_gcc_macroAssembler_ppc_sha.cpp := unused-const-variable, \ >>> 201: DISABLED_WARNINGS_gcc_nmethod.cpp := class-memaccess, \ >> >> Why you need this? > > Without I get the following error > > error: ?void* memcpy(void*, const void*, size_t)? writing to an object of non-trivially copyable type ?class nmethod?; use copy-assignment or copy-initialization instead [-Werror=class-memaccess] @chadrako See https://en.cppreference.com/w/cpp/language/classes#Trivially_copyable_class ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999721739 From kvn at openjdk.org Mon Mar 17 21:52:11 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Mar 2025 21:52:11 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 21:38:45 GMT, Evgeny Astigeevich wrote: >> [Fix relocation requires](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/code/relocInfo.hpp#L858-L860) that inline caches be cleared which [must be done at a safepoint](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/code/nmethod.cpp#L770). > > The comment to `fix_relocation_after_move` is 17 year old. > Is it still valid? > We make a copy of nmethod. The copy is not in use. We know call sites might have invalid inline caches. Could we go through the calls sites and clear them? > > Requesting a vm operation for each nmethod will be very expensive. That is very old comment which may be not true anymore. We don't do that when we replace C1 compiled code (tier2-3) with C2 compiled code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999723570 From eastigeevich at openjdk.org Mon Mar 17 21:52:11 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 17 Mar 2025 21:52:11 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: <6pyfiLbH52s0635HH-_C2GhjOTfa6mlLWwyVBYy9nDM=.2d4f84c3-d500-429c-8140-04ac28fd095a@github.com> Message-ID: <2xjIC7o0_vpKL9pWJGZjo7C_notQoe7xedzqmSPN5Uk=.75adb957-b63c-4e9c-abc3-33c880b05211@github.com> On Mon, 17 Mar 2025 21:47:06 GMT, Evgeny Astigeevich wrote: >> Without I get the following error >> >> error: ?void* memcpy(void*, const void*, size_t)? writing to an object of non-trivially copyable type ?class nmethod?; use copy-assignment or copy-initialization instead [-Werror=class-memaccess] > > @chadrako See https://en.cppreference.com/w/cpp/language/classes#Trivially_copyable_class And https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999723016 From kvn at openjdk.org Mon Mar 17 21:57:21 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Mar 2025 21:57:21 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: <2xjIC7o0_vpKL9pWJGZjo7C_notQoe7xedzqmSPN5Uk=.75adb957-b63c-4e9c-abc3-33c880b05211@github.com> References: <6pyfiLbH52s0635HH-_C2GhjOTfa6mlLWwyVBYy9nDM=.2d4f84c3-d500-429c-8140-04ac28fd095a@github.com> <2xjIC7o0_vpKL9pWJGZjo7C_notQoe7xedzqmSPN5Uk=.75adb957-b63c-4e9c-abc3-33c880b05211@github.com> Message-ID: On Mon, 17 Mar 2025 21:48:30 GMT, Evgeny Astigeevich wrote: >> @chadrako See https://en.cppreference.com/w/cpp/language/classes#Trivially_copyable_class > > And https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable We can use `Copy::disjoint_words()` as we do in `CodeBuffer::relocate_code_to().` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999729110 From kvn at openjdk.org Mon Mar 17 22:16:09 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Mar 2025 22:16:09 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: Message-ID: <3C9M1LWF86Hjsu8s3SbJLBpP5HfI3BHOkhid2SHFqVw=.6195af54-50c3-480b-8994-df7c317ac3bc@github.com> On Mon, 17 Mar 2025 21:49:04 GMT, Vladimir Kozlov wrote: >> The comment to `fix_relocation_after_move` is 17 year old. >> Is it still valid? >> We make a copy of nmethod. The copy is not in use. We know call sites might have invalid inline caches. Could we go through the calls sites and clear them? >> >> Requesting a vm operation for each nmethod will be very expensive. > > That is very old comment which may be not true anymore. We don't do that when we replace C1 compiled code (tier2-3) with C2 compiled code. Yes, we need to update call sites. Should we replace all resolved calls with calls to `resolve_*_call` blobs? Actually `clean_if_nmethod_is_unloaded()` do that. May be we indeed need to call `nmethod::cleanup_inline_caches_impl()` but without VM operation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999746881 From kvn at openjdk.org Mon Mar 17 22:21:19 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Mar 2025 22:21:19 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: <3C9M1LWF86Hjsu8s3SbJLBpP5HfI3BHOkhid2SHFqVw=.6195af54-50c3-480b-8994-df7c317ac3bc@github.com> References: <3C9M1LWF86Hjsu8s3SbJLBpP5HfI3BHOkhid2SHFqVw=.6195af54-50c3-480b-8994-df7c317ac3bc@github.com> Message-ID: On Mon, 17 Mar 2025 22:13:52 GMT, Vladimir Kozlov wrote: >> That is very old comment which may be not true anymore. We don't do that when we replace C1 compiled code (tier2-3) with C2 compiled code. > > Yes, we need to update call sites. Should we replace all resolved calls with calls to `resolve_*_call` blobs? > Actually `clean_if_nmethod_is_unloaded()` do that. May be we indeed need to call `nmethod::cleanup_inline_caches_impl()` but without VM operation. We need to do that only for new copy of nmethod and not for old. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999750390 From duke at openjdk.org Mon Mar 17 22:26:08 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 17 Mar 2025 22:26:08 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v4] In-Reply-To: References: <6pyfiLbH52s0635HH-_C2GhjOTfa6mlLWwyVBYy9nDM=.2d4f84c3-d500-429c-8140-04ac28fd095a@github.com> <2xjIC7o0_vpKL9pWJGZjo7C_notQoe7xedzqmSPN5Uk=.75adb957-b63c-4e9c-abc3-33c880b05211@github.com> Message-ID: On Mon, 17 Mar 2025 21:54:57 GMT, Vladimir Kozlov wrote: >> And https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable > > We can use `Copy::disjoint_words()` as we do in `CodeBuffer::relocate_code_to().` Casting destination to `void*` is enough to make `memcpy` happy ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r1999756993 From duke at openjdk.org Tue Mar 18 00:02:26 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 18 Mar 2025 00:02:26 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v5] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with six additional commits since the last revision: - Cast dest to void star - Update tests - Immutable data references updates - Remove current code heap check - Remove old copy constructor from header file - Remove relocate all ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/7b448c6c..8f12fd3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=03-04 Stats: 398 lines in 9 files changed: 83 ins; 276 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Tue Mar 18 00:05:58 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 18 Mar 2025 00:05:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v6] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with three additional commits since the last revision: - Fix copywrite - revert - Remove DISABLED_WARNINGS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/8f12fd3d..c8827627 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=04-05 Stats: 5 lines in 3 files changed: 2 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From sviswanathan at openjdk.org Tue Mar 18 00:35:10 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 18 Mar 2025 00:35:10 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 06:25:38 GMT, Jatin Bhateja wrote: >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 >> >> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). >> >> Summary of changes included with the patch: >> 1. C2 compiler New Vector IR creation. >> 2. Auto-vectorization support. >> 3. x86 backend implementation. >> 4. New IR verification test for each newly supported vector operation. >> >> Following are the performance numbers of Float16OperationsBenchmark >> >> System : Intel(R) Xeon(R) Processor code-named Granite rapids >> Frequency fixed at 2.5 GHz >> >> >> Baseline >> Benchmark (vectorDim) Mode Cnt Score Error Units >> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms >> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms >> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms >> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms >> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms >> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms >> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms >> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms >> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms >> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms >> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms >> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms >> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms >> Float16OperationsBenchmark.isInfiniteCMovBen... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 > - Updating benchmark > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 > - Updating copyright > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 > - Add MinVHF/MaxVHF to commutative op list > - Auto Vectorization support for Float16 operations. src/hotspot/cpu/x86/x86.ad line 11034: > 11032: %{ > 11033: match(Set dst (FmaVHF src2 (Binary dst src1))); > 11034: effect(DEF dst); DEF dst is the default behavior, do we need the effect statement here? src/hotspot/cpu/x86/x86.ad line 11046: > 11044: %{ > 11045: match(Set dst (FmaVHF src2 (Binary dst (VectorReinterpret (LoadVector src1))))); > 11046: effect(DEF dst); DEF dst is the default behavior, do we need the effect statement here? test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 65: > 63: input2[i] = floatToFloat16(rng.nextFloat()); > 64: input3[i] = floatToFloat16(rng.nextFloat()); > 65: } You could use the new Generators fill method here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r1999876698 PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r1999876291 PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r1999897286 From xgong at openjdk.org Tue Mar 18 01:28:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 18 Mar 2025 01:28:07 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value In-Reply-To: References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: On Mon, 17 Mar 2025 10:05:41 GMT, Andrew Dinn wrote: >> The following assertion fails on AArch64: >> >> >> Internal Error (jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2991), pid=3822987, tid=3823007 >> assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value >> >> >> with a simple Vector API case: >> >> public static IntVector test() { >> IntVector iv = IntVector.zero(IntVector.SPECIES_128); >> return iv.lanewise(VectorOperators.ROR, iv); >> } >> >> >> On AArch64, vector `ROR/ROL` (rotate right/left) operations are implemented with a combination of shifts. Please see the pattern for `ROR`: >> >> >> lsr dst1, src, cnt // unsigned right shift >> lsl dst2, src, bitSize - cnt // left shift >> orr dst, dst1, dst2 // logical or >> >> where `bitSize` is the element type width (e.g. `32` for `int`). In above case, `cnt` is a zero constant, resulting in a left shift of 32 (`bitSize - 0`), which exceeds the instruction's valid shift count range and triggers the assertion. To fix this, we need to mask the shift count to ensure it stays within valid range when calculating shift counts for rotate operations: `shiftCnt = shiftCnt & (bitSize - 1)`. >> >> Note that the mask is only necessary for constant shift counts. This not only fixes the assertion failure, but also allows `ROR/ROL src, 0` to be optimized to `src` directly. >> >> For vector variables as shift counts, the masking can be safely omitted because: >> 1. Vector shift instructions that take a vector register as the shift count may not automatically apply modulo arithmetic based on register size. When the shift count is `32` for int type, the result may be either `zeros` or `src`. However, this doesn't affect correctness for rotate since the final result is combined with `src` using a logical `OR` operation. >> 2. It saves a vector logical `AND` for masking, which is friendly to the performance. > > test/hotspot/jtreg/compiler/vectorapi/TestRotateWithZero.java line 80: > >> 78: >> 79: private static void rotateLeftWithZero() { >> 80: IntVector vzero = IntVector.zero(I_SPECIES); > > This local is unused. Thanks for your review! Good catch. I will remove it soon. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24051#discussion_r1999966633 From xgong at openjdk.org Tue Mar 18 02:36:14 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 18 Mar 2025 02:36:14 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v4] In-Reply-To: References: Message-ID: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Update IR test based on the review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23790/files - new: https://git.openjdk.org/jdk/pull/23790/files/3d44a05b..1cbff61f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23790&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23790&range=02-03 Stats: 116 lines in 1 file changed: 94 ins; 5 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/23790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23790/head:pull/23790 PR: https://git.openjdk.org/jdk/pull/23790 From xgong at openjdk.org Tue Mar 18 02:39:13 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 18 Mar 2025 02:39:13 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 07:33:16 GMT, Emanuel Peter wrote: >> But the testing on my side so far looks good. I'll rerun once you add your IR tests. > >> Thanks for looking at this PR again @eme64 ! Vector API has its own jtreg tests under `test/jdk/jdk/incubator/vector/`. I double checked that it has the `rearrange` test for all vector species. Please see one of the test here: https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Long128VectorTests.java#L4954 That's also way I did not add the correct tests in the IR test file. > > Alright. I think result verification would still be good practice, and not too difficult to do using a `@Check` method and `Verify.java` for comparing the resulting arrays. But I leave that up to you. In my experience, the VectorAPI test coverage is not as good as I first thought, see the list of bugs I recently found: > https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC > > So adding a little more rigor to your IR test could catch possible bugs that the existing tests simply do not cover. Hi @eme64 , the IR test is updated according to your suggestion. Could you please look at it again? Thanks so much! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2731435519 From qxing at openjdk.org Tue Mar 18 03:25:10 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 18 Mar 2025 03:25:10 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 09:00:38 GMT, Qizheng Xing wrote: >> In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. >> >> loop-safepoint >> >> This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). >> >> >> Benchmark Mode Cnt Score Error Units >> LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline >> LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch >> >> >> Testing: tier1-2 on x86_64 and aarch64. > > Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into enhance-loop-safepoint-elim > - Add IR test and microbench. > - Make `PhaseIdealLoop` eliminate more redundant safepoints in loops. Hi all, This patch has now passed all GHA tests and is ready for further reviews. If there are any other suggestions for this PR, please let me know. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2731505522 From xgong at openjdk.org Tue Mar 18 03:51:55 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 18 Mar 2025 03:51:55 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value [v2] In-Reply-To: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: > The following assertion fails on AArch64: > > > Internal Error (jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2991), pid=3822987, tid=3823007 > assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value > > > with a simple Vector API case: > > public static IntVector test() { > IntVector iv = IntVector.zero(IntVector.SPECIES_128); > return iv.lanewise(VectorOperators.ROR, iv); > } > > > On AArch64, vector `ROR/ROL` (rotate right/left) operations are implemented with a combination of shifts. Please see the pattern for `ROR`: > > > lsr dst1, src, cnt // unsigned right shift > lsl dst2, src, bitSize - cnt // left shift > orr dst, dst1, dst2 // logical or > > where `bitSize` is the element type width (e.g. `32` for `int`). In above case, `cnt` is a zero constant, resulting in a left shift of 32 (`bitSize - 0`), which exceeds the instruction's valid shift count range and triggers the assertion. To fix this, we need to mask the shift count to ensure it stays within valid range when calculating shift counts for rotate operations: `shiftCnt = shiftCnt & (bitSize - 1)`. > > Note that the mask is only necessary for constant shift counts. This not only fixes the assertion failure, but also allows `ROR/ROL src, 0` to be optimized to `src` directly. > > For vector variables as shift counts, the masking can be safely omitted because: > 1. Vector shift instructions that take a vector register as the shift count may not automatically apply modulo arithmetic based on register size. When the shift count is `32` for int type, the result may be either `zeros` or `src`. However, this doesn't affect correctness for rotate since the final result is combined with `src` using a logical `OR` operation. > 2. It saves a vector logical `AND` for masking, which is friendly to the performance. Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Update the test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24051/files - new: https://git.openjdk.org/jdk/pull/24051/files/6bf57d0e..82b58ea4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24051&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24051&range=00-01 Stats: 28 lines in 1 file changed: 4 ins; 6 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/24051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24051/head:pull/24051 PR: https://git.openjdk.org/jdk/pull/24051 From xgong at openjdk.org Tue Mar 18 05:45:14 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 18 Mar 2025 05:45:14 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value [v2] In-Reply-To: References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: On Mon, 17 Mar 2025 10:07:31 GMT, Andrew Dinn wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Update the test case > > Looks good. Hi @adinn , test has been updated. Thanks for your reviewing! Hi @chhagedorn could you please help to take a look at this PR? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24051#issuecomment-2731736258 From epeter at openjdk.org Tue Mar 18 06:54:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 06:54:15 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 20:16:59 GMT, Christian Hagedorn wrote: >> During work on [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942) I discovered that it is currently not possible to compile VectorAPI code because it is still in incubator mode and needs flag "--add-modules=jdk.incubator.vector" for "javac". >> >> Also: "javac" can produce warnings, and that leads to issues like this: [JDK-8351998](https://bugs.openjdk.org/browse/JDK-8351998). We should allow such warnings, they are not compile failures. >> >> Example: >> >> javac --add-modules=jdk.incubator.vector Test.java >> warning: [incubating] using incubating module(s): jdk.incubator.vector >> 1 warning >> >> >> I added an example test as well. > > test/hotspot/jtreg/compiler/lib/compile_framework/Compile.java line 200: > >> 198: >> 199: // Note: the output can be non-empty even if the compilation succeeds, e.g. for warnings. >> 200: if (exitCode != 0) { > > Would `-XX:-PrintWarnings` also work? Maybe... but I don't want to risk it. There have recently been a few sightings where `javac` printed some messages, see [JDK-8351998](https://bugs.openjdk.org/browse/JDK-8351998). I think it's just not worth it to fail if it prints anything. Exit code should be sufficient. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24082#discussion_r2000326887 From epeter at openjdk.org Tue Mar 18 06:59:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 06:59:07 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 20:21:17 GMT, Christian Hagedorn wrote: >> During work on [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942) I discovered that it is currently not possible to compile VectorAPI code because it is still in incubator mode and needs flag "--add-modules=jdk.incubator.vector" for "javac". >> >> Also: "javac" can produce warnings, and that leads to issues like this: [JDK-8351998](https://bugs.openjdk.org/browse/JDK-8351998). We should allow such warnings, they are not compile failures. >> >> Example: >> >> javac --add-modules=jdk.incubator.vector Test.java >> warning: [incubating] using incubating module(s): jdk.incubator.vector >> 1 warning >> >> >> I added an example test as well. > > test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/IRFrameworkWithVectorAPIExample.java line 45: > >> 43: /** >> 44: * This test shows that the IR verification can be done on code compiled by the Compile Framework. >> 45: * The "@compile" command for JTREG is required so that the IRFramework is compiled, other javac > > What about a `CompileFilework::invokeIRTest()` or something like that, that just additionally loads the `TestFramework` class somehow (for example just creating an instance or accessing a field etc., we could also provide an empty static method in `TestFramework.java` that can be called to minimize the overhead), would that work? Then we do not need to worry about adding `@compile` or figuring out why the IR test is not working. Maybe... but is this kind of magic really worth it? It would also mean that any `CompileFraework` test always loads the `TestFramework`. And if we decide to do this, I think it would be a separate RFE, since I'm just copying from `test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/IRFrameworkJavaExample.java` here ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24082#discussion_r2000332758 From epeter at openjdk.org Tue Mar 18 07:06:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 07:06:07 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI [v2] In-Reply-To: References: Message-ID: > During work on [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942) I discovered that it is currently not possible to compile VectorAPI code because it is still in incubator mode and needs flag "--add-modules=jdk.incubator.vector" for "javac". > > Also: "javac" can produce warnings, and that leads to issues like this: [JDK-8351998](https://bugs.openjdk.org/browse/JDK-8351998). We should allow such warnings, they are not compile failures. > > Example: > > javac --add-modules=jdk.incubator.vector Test.java > warning: [incubating] using incubating module(s): jdk.incubator.vector > 1 warning > > > I added an example test as well. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24082/files - new: https://git.openjdk.org/jdk/pull/24082/files/3b9eb4ce..fb4cd1c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24082&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24082&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24082/head:pull/24082 PR: https://git.openjdk.org/jdk/pull/24082 From epeter at openjdk.org Tue Mar 18 07:12:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 07:12:03 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v2] In-Reply-To: References: Message-ID: <9OtjMzuRRR6XQV2puQbP_rfhzLYI0lCgaJtaGqbpcpk=.c4b10cdc-0afd-40a4-a916-c1b1be2bf2e1@github.com> > With the Template Framework, I'm generating IR tests randomly. But random code can always hit bailouts in compilation, and make code not compilable any more. We should have a way to disable this check, and just gracefully continue to execute the tests. > > To allow a single test method to be `not compilable`: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L160-L161 > > To allow all test methods to be `not compilable`: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L140-L144 > > See also this documentation in the code: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/compiler/lib/ir_framework/Test.java#L88-L94 > > --------------------------------------- > > **Backrgound** > > My random code seems to hit a bailout in the Register Allocator, and I cannot do much to predict if that bailout happens. > See https://bugs.openjdk.org/browse/JDK-8304328 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: documentation from Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24049/files - new: https://git.openjdk.org/jdk/pull/24049/files/ce40f140..e98dd89a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24049&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24049&range=00-01 Stats: 9 lines in 3 files changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24049.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24049/head:pull/24049 PR: https://git.openjdk.org/jdk/pull/24049 From epeter at openjdk.org Tue Mar 18 07:12:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 07:12:04 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 20:47:10 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> documentation from Christian > > Some first comments, will continue tomorrow :-) @chhagedorn Thanks for the review and suggestions, I applied them all :) > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 417: > >> 415: * test failure. However, if such cases are expected in a test class, this flag can be set to true, which >> 416: * allows the all test to pass even if there is no compilation. Any associated {@link IR} rule is only executed >> 417: * if the test method was compiled, and else it is ignored silently. > > /** > * In rare cases, methods may not be compilable because of a compilation bailout. By default, this leads to a > * test failure. However, if such cases are expected in multiple methods in a test class, this flag can be set to > * true, which allows any test to pass even if there is a compilation bailout. If only selected methods are prone > * to bail out, it is preferred to use {@link Test#allowNotCompilable()} instead for more fine-grained control. > * By setting this flag, any associated {@link IR} rule of a test is only executed if the test method was compiled, > * and else it is ignored silently. Applied :) > test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irmethod/NotCompilableIRMethod.java line 34: > >> 32: /** >> 33: * This class represents a special IR method which was not compiled by the IR framework, but this was explicitly allowed >> 34: * by "allowNotCompilable". > > Maybe add here in what context this is used: > Suggestion: > > * by "allowNotCompilable". This happens when the compiler bails out of a compilation (i.e. no compilation) but we treat > * this as valid case. applied :) > test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irmethod/NotCompilableIRMethodMatchResult.java line 35: > >> 33: /** >> 34: * This class represents a special matching result of an IR method where the compilation output was completely empty, >> 35: * but this was exlicitly allowed by "allowNotCompilable". > > Maybe also add the same addition from above here as well. applied :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24049#issuecomment-2731901963 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000343081 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000345809 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000345715 From epeter at openjdk.org Tue Mar 18 07:25:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 07:25:07 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 06:56:28 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/IRFrameworkWithVectorAPIExample.java line 45: >> >>> 43: /** >>> 44: * This test shows that the IR verification can be done on code compiled by the Compile Framework. >>> 45: * The "@compile" command for JTREG is required so that the IRFramework is compiled, other javac >> >> What about a `CompileFilework::invokeIRTest()` or something like that, that just additionally loads the `TestFramework` class somehow (for example just creating an instance or accessing a field etc., we could also provide an empty static method in `TestFramework.java` that can be called to minimize the overhead), would that work? Then we do not need to worry about adding `@compile` or figuring out why the IR test is not working. > > Maybe... but is this kind of magic really worth it? It would also mean that any `CompileFraework` test always compiles the `TestFramework`. And if we decide to do this, I think it would be a separate RFE, since I'm just copying from `test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/IRFrameworkJavaExample.java` here ;) FYI: the issue will probably get worse over time, as there may be more and more test-library parts that are not used in the JTREG test directly, but only in the CompileFramework compiled code. I currently have a test under development where I have to write this: /* * @test * @summary Test the Template Library's expression generation for the Vector API. * @modules jdk.incubator.vector * @modules java.base/jdk.internal.misc * @library /test/lib / * @compile ../../../compiler/lib/ir_framework/TestFramework.java * @compile ../../../compiler/lib/generators/Generators.java * @compile ../../../compiler/lib/verify/Verify.java * @run driver template_library.examples.TestFuzzVectorAPI */ But I'm not sure which test libraries we should always load... maybe we can address this down the road, when it really becomes cumbersome for people, and we know more what we want? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24082#discussion_r2000365127 From rehn at openjdk.org Tue Mar 18 07:34:06 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 18 Mar 2025 07:34:06 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:57:48 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > These client tests seems not that useful to me, so the simple solution could be just disable them on riscv. > > Thanks! I think someone took a short-cut and added flagless vm. The correct requires should be: - * @requires vm.flagless + * @requires vm.compiler2.enabled & vm.opt.final.UseMD5Intrinsics == true ------------- PR Comment: https://git.openjdk.org/jdk/pull/24027#issuecomment-2731951810 From duke at openjdk.org Tue Mar 18 07:37:12 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 18 Mar 2025 07:37:12 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: > WIP. > > It worked for cases in the TestMergeLoads.java and can observe performance improvement in MergeLoadBench.getIntB . Need to check more cases. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Revert extract value and add more tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/fb6cd3d7..b621db1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=02-03 Stats: 355 lines in 3 files changed: 230 ins; 111 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From epeter at openjdk.org Tue Mar 18 07:46:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 07:46:11 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 02:36:14 GMT, Xiaohong Gong wrote: >> The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. >> >> Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. >> >> This patch added the rearrange support for vector types with small lane count. Here are the main changes: >> - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) >> - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation >> - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one >> >> Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. >> >> 1) NEON >> >> JMH on panama-vector:vectorIntrinsics: >> >> Benchmark (size) Mode Cnt Units Before After Gain >> Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x >> Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x >> Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x >> Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x >> Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x >> Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x >> Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x >> Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x >> Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x >> Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x >> Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x >> Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update IR test based on the review comment test/hotspot/jtreg/compiler/vectorapi/VectorRearrangeTest.java line 307: > 305: public static void main(String[] args) { > 306: TestFramework testFramework = new TestFramework(); > 307: testFramework.setDefaultWarmup(10000) Oh, I just see that you are modifying the warmup. Is that necessary? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r2000391275 From chagedorn at openjdk.org Tue Mar 18 07:49:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 07:49:09 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:22:43 GMT, Emanuel Peter wrote: >> Maybe... but is this kind of magic really worth it? It would also mean that any `CompileFraework` test always compiles the `TestFramework`. And if we decide to do this, I think it would be a separate RFE, since I'm just copying from `test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/IRFrameworkJavaExample.java` here ;) > > FYI: the issue will probably get worse over time, as there may be more and more test-library parts that are not used in the JTREG test directly, but only in the CompileFramework compiled code. I currently have a test under development where I have to write this: > > /* > * @test > * @summary Test the Template Library's expression generation for the Vector API. > * @modules jdk.incubator.vector > * @modules java.base/jdk.internal.misc > * @library /test/lib / > * @compile ../../../compiler/lib/ir_framework/TestFramework.java > * @compile ../../../compiler/lib/generators/Generators.java > * @compile ../../../compiler/lib/verify/Verify.java > * @run driver template_library.examples.TestFuzzVectorAPI > */ > > But I'm not sure which test libraries we should always load... maybe we can address this down the road, when it really becomes cumbersome for people, and we know more what we want? > It would also mean that any CompileFraework test always compiles the TestFramework I don't think so, wouldn't it only load it when you call `invokeIrTest()`? So, when you only call something from the `TestFramework` class there, I think it will only be loaded and initialized when you actually call `invokeIrTest()`: Object invokeIrTest(String className, String methodName, Object[] args) { TestFramework.loadClass(); return invoke(className, methodName, args); } > And if we decide to do this, I think it would be a separate RFE, Sure, that's fine. I only just noticed this when reading the comment :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24082#discussion_r2000394342 From xgong at openjdk.org Tue Mar 18 07:49:11 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 18 Mar 2025 07:49:11 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:42:41 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Update IR test based on the review comment > > test/hotspot/jtreg/compiler/vectorapi/VectorRearrangeTest.java line 307: > >> 305: public static void main(String[] args) { >> 306: TestFramework testFramework = new TestFramework(); >> 307: testFramework.setDefaultWarmup(10000) > > Oh, I just see that you are modifying the warmup. Is that necessary? Yes, I think we'd better use a larger warmup to make sure the vector api intrinsics are inlined in C2, so that the IR check can pass. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r2000395614 From chagedorn at openjdk.org Tue Mar 18 07:49:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 07:49:08 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 06:51:56 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/compile_framework/Compile.java line 200: >> >>> 198: >>> 199: // Note: the output can be non-empty even if the compilation succeeds, e.g. for warnings. >>> 200: if (exitCode != 0) { >> >> Would `-XX:-PrintWarnings` also work? > > Maybe... but I don't want to risk it. There have recently been a few sightings where `javac` printed some messages, see [JDK-8351998](https://bugs.openjdk.org/browse/JDK-8351998). I think it's just not worth it to fail if it prints anything. Exit code should be sufficient. Agreed, it's safer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24082#discussion_r2000394948 From epeter at openjdk.org Tue Mar 18 07:49:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 07:49:09 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 02:36:14 GMT, Xiaohong Gong wrote: >> The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. >> >> Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. >> >> This patch added the rearrange support for vector types with small lane count. Here are the main changes: >> - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) >> - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation >> - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one >> >> Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. >> >> 1) NEON >> >> JMH on panama-vector:vectorIntrinsics: >> >> Benchmark (size) Mode Cnt Units Before After Gain >> Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x >> Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x >> Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x >> Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x >> Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x >> Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x >> Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x >> Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x >> Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x >> Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x >> Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x >> Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.... > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update IR test based on the review comment The test code looks good, except for the little comment above :) I'm running some testing now... ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2731982794 From epeter at openjdk.org Tue Mar 18 07:53:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 07:53:08 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:46:17 GMT, Xiaohong Gong wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorRearrangeTest.java line 307: >> >>> 305: public static void main(String[] args) { >>> 306: TestFramework testFramework = new TestFramework(); >>> 307: testFramework.setDefaultWarmup(10000) >> >> Oh, I just see that you are modifying the warmup. Is that necessary? > > Yes, I think we'd better use a larger warmup to make sure the vector api intrinsics are inlined in C2, so that the IR check can pass. Did you try without? The default warmup should be sufficient I think. But I could be wrong. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r2000403368 From epeter at openjdk.org Tue Mar 18 07:57:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 07:57:07 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:45:15 GMT, Christian Hagedorn wrote: >> It would also mean that any CompileFraework test always compiles the TestFramework > I don't think so, wouldn't it only load it when you call invokeIrTest()? JTREG would always compile the TestFramework, but the test would not always load the TestFramework class ;) >> And if we decide to do this, I think it would be a separate RFE, > Sure, that's fine. I only just noticed this when reading the comment :-) Ok, good, I'll keep it in mind. I mean it's bothering me a little too, I'm just not sure yet if or how to fix it best. Especially because there are now multiple test-frameworks, and we may want to compile and load any combination of them... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24082#discussion_r2000407812 From xgong at openjdk.org Tue Mar 18 08:04:13 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 18 Mar 2025 08:04:13 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:50:51 GMT, Emanuel Peter wrote: >> Yes, I think we'd better use a larger warmup to make sure the vector api intrinsics are inlined in C2, so that the IR check can pass. > > Did you try without? The default warmup should be sufficient I think. But I could be wrong. Yes, actually it can pass without this sometimes. I'm afriad the IR test would fail in future, as I met the random failure issue before on other IR tests. I also checked some existing tests under vectorapi, almost all have either 5000 or 10000 warmup. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r2000419336 From chagedorn at openjdk.org Tue Mar 18 08:05:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 08:05:07 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:06:07 GMT, Emanuel Peter wrote: >> During work on [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942) I discovered that it is currently not possible to compile VectorAPI code because it is still in incubator mode and needs flag "--add-modules=jdk.incubator.vector" for "javac". >> >> Also: "javac" can produce warnings, and that leads to issues like this: [JDK-8351998](https://bugs.openjdk.org/browse/JDK-8351998). We should allow such warnings, they are not compile failures. >> >> Example: >> >> javac --add-modules=jdk.incubator.vector Test.java >> warning: [incubating] using incubating module(s): jdk.incubator.vector >> 1 warning >> >> >> I added an example test as well. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > typo Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24082#pullrequestreview-2693366456 From chagedorn at openjdk.org Tue Mar 18 08:05:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 08:05:07 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:54:17 GMT, Emanuel Peter wrote: > > > It would also mean that any CompileFraework test always compiles the TestFramework > > > I don't think so, wouldn't it only load it when you call invokeIrTest()? > > JTREG would always compile the TestFramework, but the test would not always load the TestFramework class ;) Ah yes, you're right, that's true. > > > And if we decide to do this, I think it would be a separate RFE, > > > Sure, that's fine. I only just noticed this when reading the comment :-) > > Ok, good, I'll keep it in mind. I mean it's bothering me a little too, I'm just not sure yet if or how to fix it best. Especially because there are now multiple test-frameworks, and we may want to compile and load any combination of them... Yeah, me too - maybe it's worth to revisit this again and discuss possible low-overhead solutions. > But I'm not sure which test libraries we should always load... maybe we can address this down the road, when it really becomes cumbersome for people, and we know more what we want? That's really a lot of compile statements you need to make sure to add. Let's discuss this again later and go with what we have now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24082#discussion_r2000417334 From epeter at openjdk.org Tue Mar 18 08:06:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 08:06:11 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 03:22:22 GMT, Qizheng Xing wrote: >> Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into enhance-loop-safepoint-elim >> - Add IR test and microbench. >> - Make `PhaseIdealLoop` eliminate more redundant safepoints in loops. > > Hi all, > > This patch has now passed all GHA tests and is ready for further reviews. > > If there are any other suggestions for this PR, please let me know. > > Thanks! @MaxXSoft I'm not an expert with SafePoints, but I'd be willing to review if you answer my questions above, and maybe some more I'll have later ;) One question I just had now: Assume we now remove the SafePoint because there is that other call above. But what if later we inline that call - do we still have some SafePoint after that in the loop? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2732027766 From duke at openjdk.org Tue Mar 18 08:12:14 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 18 Mar 2025 08:12:14 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> On Tue, 18 Mar 2025 07:37:12 GMT, kuaiwei wrote: >> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. >> >> The benchmark result of MergeLoadBench.java >> AMD EPYC 9T24 96-Core Processor: >> >> |name | -MergeLoads | +MergeLoads |delta| >> |---|---|---|---| >> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | >> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | >> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | >> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | >> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | >> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | >> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | >> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | >> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | >> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | >> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | >> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | >> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | >> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | >> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | >> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | >> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | >> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | >> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | >> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | >> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | >> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | >> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | >> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | >> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | >> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | >> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | >> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | >> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | >> |Merg... > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Revert extract value and add more tests @eme64 @robcasloz I think the patch for merge loads optimization is ready for PR, could you take time to review it? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2732040941 From dlong at openjdk.org Tue Mar 18 08:27:12 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 18 Mar 2025 08:27:12 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v6] In-Reply-To: References: <8BMaz7Ssnwk6ywuPYfgaL1VSN7XZYBzpvMWRsG12-Eg=.441d03b6-4a0b-479b-b789-d9bfd03bd346@github.com> Message-ID: On Mon, 17 Mar 2025 19:48:35 GMT, Vladimir Kozlov wrote: >> I believe this should behave the same as creating any other nmethod. `CodeCache_lock` is the only thing the other constructors use when initializing nmethods and if this was an issue they could also encounter the same race between initialization and setting the field. >> >> Looking through the GC code also shows they do hold `CodeCache_lock` before scans. [G1](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/gc/g1/g1HeapRegionRemSet.cpp#L112), [Shenandoah](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/gc/shenandoah/shenandoahCodeRoots.cpp#L195), [ZGC](https://github.com/openjdk/jdk/blob/19154f7af34bf6f13d61d7a9f05d6277964845d8/src/hotspot/share/gc/z/zNMethodTable.cpp#L215) > > Thank you for checking it. Other nmethod contructors don't have the same locking requirements, because the nmethod hasn't been registered with GC yet. However, for the source nmethod, it could be concurrently patched by GC threads without codeCache_lock and only the per-nmethod CompiledICLocker locking mechanism. So using memcpy() seems problematic here, because a byte-by-byte copy might see on partial updates from NativeCall::set_destination_mt_safe, for example. Also, there seems to be a critical race with GC here. The destination nmethod isn't going to be registered with GC yet, correct? In that case, GC may patch the source nmethod right after the copy, but before the destination copy is registered, leaving the destination copy with stale data. This seems fatal, as I believe this breaks crucial invariants preventing call sites from referencing stale data. @fisk, am I on the right track here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2000455741 From epeter at openjdk.org Tue Mar 18 08:37:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 08:37:15 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:37:12 GMT, kuaiwei wrote: >> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. >> >> The benchmark result of MergeLoadBench.java >> AMD EPYC 9T24 96-Core Processor: >> >> |name | -MergeLoads | +MergeLoads |delta| >> |---|---|---|---| >> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | >> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | >> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | >> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | >> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | >> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | >> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | >> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | >> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | >> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | >> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | >> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | >> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | >> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | >> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | >> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | >> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | >> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | >> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | >> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | >> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | >> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | >> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | >> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | >> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | >> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | >> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | >> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | >> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | >> |Merg... > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Revert extract value and add more tests @kuaiwei Thanks for working on this! I had a quick look at the IR tests and left a few comments already. Additionally, it would be good if you had some "chaotic" tests, i.e. ones where the loads are reordered but could still be merged, and some that should not be merged. I'll keep reviewing now... test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 44: > 42: * @run main compiler.c2.TestMergeLoads aligned > 43: * > 44: * @requires os.arch != "riscv64" | vm.cpu.features ~= ".*zbb.*" Can you remove this global requirement, so that those platforms can at least do result verification? You can always add restrictions to the `@IR` rules. test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 55: > 53: byte[] aB = new byte[RANGE]; > 54: char[] aC = new char[RANGE]; > 55: short[] aS = new short[RANGE]; What about merging loads on `int[]`? test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 69: > 67: switch (args[0]) { > 68: case "aligned" -> { framework.addFlags("-XX:-UseUnalignedAccesses"); } > 69: case "unaligned" -> { framework.addFlags("-XX:+UseUnalignedAccesses"); } Can you please also add an explicit run with `StressIGVN`? Because the flag is not whitelisted for the TestFramework, and so if it was set from the outside, the IR rules would not be executed. But it would be nice that your algorithm is stable to reorderings in IGVN ;) test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 145: > 143: a[i] = (short)RANDOM.nextInt(); > 144: } > 145: } Suggestion: } Please put a space between methods ;) test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 235: > 233: } > 234: } > 235: } In the meantime, I've developed a `Verify.java`, exactly for this. Would you mind using it, it would reduce the amount of code here quite a bit ;) test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 290: > 288: int[] ret = {i1}; > 289: return new Object[]{ret}; > 290: } Ah, I see you are using `|`. Can we also use `+`? FYI: if you use `Verify.java`, then you could directly do `return i2;`. It would get returned as a boxed `Integer`, and should be compared that way, without allocating an `int[]` and `Object[]`. test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 423: > 421: @IR(counts = {IRNode.LOAD_I_OF_CLASS, "byte\\\\[int:>=0] \\\\(java/lang/Cloneable,java/io/Serializable\\\\)", "1"}, > 422: applyIf = {"UseUnalignedAccesses", "true"}, > 423: applyIfPlatform = {"big-endian", "true"}) Can you please also check for byte and char loads? It would make sure that we do not have any loads that we are not expecting, and that the graph was cleaned appropriately. ------------- PR Review: https://git.openjdk.org/jdk/pull/24023#pullrequestreview-2693406473 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000438853 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000448210 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000446641 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000453897 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000455701 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000459936 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000465656 From chagedorn at openjdk.org Tue Mar 18 08:45:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 08:45:12 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v2] In-Reply-To: <9OtjMzuRRR6XQV2puQbP_rfhzLYI0lCgaJtaGqbpcpk=.c4b10cdc-0afd-40a4-a916-c1b1be2bf2e1@github.com> References: <9OtjMzuRRR6XQV2puQbP_rfhzLYI0lCgaJtaGqbpcpk=.c4b10cdc-0afd-40a4-a916-c1b1be2bf2e1@github.com> Message-ID: On Tue, 18 Mar 2025 07:12:03 GMT, Emanuel Peter wrote: >> With the Template Framework, I'm generating IR tests randomly. But random code can always hit bailouts in compilation, and make code not compilable any more. We should have a way to disable this check, and just gracefully continue to execute the tests. >> >> To allow a single test method to be `not compilable`: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L160-L161 >> >> To allow all test methods to be `not compilable`: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L140-L144 >> >> See also this documentation in the code: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/compiler/lib/ir_framework/Test.java#L88-L94 >> >> --------------------------------------- >> >> **Backrgound** >> >> My random code seems to hit a bailout in the Register Allocator, and I cannot do much to predict if that bailout happens. >> See https://bugs.openjdk.org/browse/JDK-8304328 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > documentation from Christian test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 418: > 416: * true, which allows any test to pass even if there is a compilation bailout. If only selected methods are prone > 417: * to bail out, it is preferred to use {@link Test#allowNotCompilable()} instead for more fine-grained control. > 418: * By setting this flag, any associated {@link IR} rule of a test is only executed if the test method was compiled, Whitespace error. test/hotspot/jtreg/compiler/lib/ir_framework/test/AbstractTest.java line 122: > 120: tryCompileMethod(test); > 121: } catch (MethodNotCompilableException e) { > 122: TestRun.check(test.isAllowNotCompilable(), "Only allowNotCompilable methods should throw MethodNotCompilableException."); Should we emit a log message here in case we have an expected compilation bailout here? test/hotspot/jtreg/compiler/lib/ir_framework/test/AbstractTest.java line 175: > 173: } > 174: TestRun.check(WHITE_BOX.isMethodCompilable(testMethod, test.getCompLevel().getValue(), false), > 175: "Method " + testMethod + " not compilable (anymore) at level " + test.getCompLevel()); You now check `isMethodCompilable()` twice. You could move the second check inside the `catch` block for `MethodNotCompilableException`. Then you can use this exception whenever there is a method that is not compilable (as the class name suggests) and then only do the separation between allowed and disallowed cases there. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java line 102: > 100: throw new RuntimeException("should have thrown TestRunException"); > 101: } catch (TestVMException e) { > 102: } catch (IRViolationException e) {} This is a mismatch. Which exceptions are now expected? Same below. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java line 403: > 401: @Override > 402: public void visitMethodNotCompilable(Method method, int failedIRRules) { > 403: throw new RuntimeException("Should not reach here"); Maybe change to: Suggestion: throw new RuntimeException("No test should bailout from compilation"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000480128 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000446479 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000473330 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000477309 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000478578 From duke at openjdk.org Tue Mar 18 08:51:10 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 18 Mar 2025 08:51:10 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:13:53 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 44: > >> 42: * @run main compiler.c2.TestMergeLoads aligned >> 43: * >> 44: * @requires os.arch != "riscv64" | vm.cpu.features ~= ".*zbb.*" > > Can you remove this global requirement, so that those platforms can at least do result verification? > You can always add restrictions to the `@IR` rules. Ok, I will add them to @IR rules. > test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 69: > >> 67: switch (args[0]) { >> 68: case "aligned" -> { framework.addFlags("-XX:-UseUnalignedAccesses"); } >> 69: case "unaligned" -> { framework.addFlags("-XX:+UseUnalignedAccesses"); } > > Can you please also add an explicit run with `StressIGVN`? Because the flag is not whitelisted for the TestFramework, and so if it was set from the outside, the IR rules would not be executed. But it would be nice that your algorithm is stable to reorderings in IGVN ;) I think my optimization is not dependent on order of IGVN. I will verify it with this option. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000490619 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000495030 From duke at openjdk.org Tue Mar 18 08:59:14 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 18 Mar 2025 08:59:14 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:19:47 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 55: > >> 53: byte[] aB = new byte[RANGE]; >> 54: char[] aC = new char[RANGE]; >> 55: short[] aS = new short[RANGE]; > > What about merging loads on `int[]`? Now there's limit to merge 2 LoadI as LoadL. For byte and short, there's already unsigned load for them in C2, so they can extend safely. But there's no unsigned load for integer, so I stop merging 2 integer load in this patch. I will check if it can be done in other way. > test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 145: > >> 143: a[i] = (short)RANDOM.nextInt(); >> 144: } >> 145: } > > Suggestion: > > } > > > Please put a space between methods ;) Ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000516075 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000518140 From epeter at openjdk.org Tue Mar 18 09:05:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 09:05:17 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:37:12 GMT, kuaiwei wrote: >> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. >> >> The benchmark result of MergeLoadBench.java >> AMD EPYC 9T24 96-Core Processor: >> >> |name | -MergeLoads | +MergeLoads |delta| >> |---|---|---|---| >> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | >> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | >> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | >> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | >> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | >> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | >> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | >> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | >> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | >> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | >> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | >> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | >> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | >> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | >> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | >> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | >> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | >> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | >> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | >> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | >> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | >> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | >> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | >> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | >> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | >> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | >> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | >> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | >> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | >> |Merg... > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Revert extract value and add more tests Ok, I gave it a quick first scan, and have some more questions and suggestions :) src/hotspot/share/opto/memnode.cpp line 1853: > 1851: * +---> Or2 <----+ | > 1852: * | | > 1853: * +-----> Or3 <------+ The code above has masking, the graph not. Can you add an explanatory comment, please ;) src/hotspot/share/opto/memnode.cpp line 1855: > 1853: * +-----> Or3 <------+ > 1854: * > 1855: * It will be transformed as a merged LoadI and replace the Or3 node Suggestion: * It is transformed as a merged LoadI, which replaces the Or3 node. src/hotspot/share/opto/memnode.cpp line 1862: > 1860: /* > 1861: * LoadNode and OrNode pair which represent an item for merging, > 1862: * And we can get some properties like shift and last_op from it. To me it seems that `load` and `shift` are the relevant elements here, and the `or` and `last_op` are the "side-info", only used if it is the last op. Is that correct? src/hotspot/share/opto/memnode.cpp line 1877: > 1875: _last_op(false) {} > 1876: void set_last_op(bool v) { _last_op = v; } > 1877: bool last_op() const { return _last_op; } Is it feasible to make all fields `const`? It can often make reasoning about code easier if you know that there can be no modifications. Not sure about `_last_op`, I'll have to keep reading to find out. src/hotspot/share/opto/memnode.cpp line 1896: > 1894: LowToHigh, // Adjacent and first load access low address > 1895: HighToLow, // Adjacent and first load access high address > 1896: NotAdjacent // Not adjacent What happens if an `OrNode` has its inputs swapped? This can happen if the node idx are the "wrong way around". See `commute`, comment `Otherwise, sort inputs (commutativity) to help value numbering`. I don't know how likely this is to happen. What do you think? src/hotspot/share/opto/memnode.cpp line 1902: > 1900: PhaseGVN* const _phase; > 1901: LoadNode* const _load; > 1902: int _last_op_index; // Index of the last item in merged_list What is the `merged_list`? I could not find it. src/hotspot/share/opto/memnode.cpp line 1976: > 1974: // Go through ConvI2L which is unique output of the load > 1975: Node* MergePrimitiveLoads::by_pass_i2l(const LoadNode* l) { > 1976: if ( l != nullptr && l->outcnt() == 1 && l->unique_out()->Opcode() == Op_ConvI2L) { Suggestion: if (l != nullptr && l->outcnt() == 1 && l->unique_out()->Opcode() == Op_ConvI2L) { src/hotspot/share/opto/memnode.cpp line 1979: > 1977: return l->unique_out(); > 1978: } else { > 1979: return (Node*)l; Hmm, I don't like casting away `const`... is there a way to avoid this? src/hotspot/share/opto/memnode.cpp line 2522: > 2520: MergePrimitiveLoads merge(phase, this); > 2521: Node* merged = merge.run(); > 2522: if (merged != nullptr) { return merged; } I'm a little confused here... So imagine we have a `LoadB` here. How can we now return a `LoadI` instead, and replace all uses of the `LoadB` with the `LoadI`? Should we not be replacing the `OrI` instead? ------------- PR Review: https://git.openjdk.org/jdk/pull/24023#pullrequestreview-2693479230 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000489282 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000478996 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000495348 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000484074 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000505715 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000508791 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000516974 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000512944 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000526992 From epeter at openjdk.org Tue Mar 18 09:05:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 09:05:17 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:54:14 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > src/hotspot/share/opto/memnode.cpp line 1979: > >> 1977: return l->unique_out(); >> 1978: } else { >> 1979: return (Node*)l; > > Hmm, I don't like casting away `const`... is there a way to avoid this? Could the output pointer be `const`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000515583 From duke at openjdk.org Tue Mar 18 09:05:18 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 18 Mar 2025 09:05:18 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:27:44 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 290: > >> 288: int[] ret = {i1}; >> 289: return new Object[]{ret}; >> 290: } > > Ah, I see you are using `|`. Can we also use `+`? > > FYI: if you use `Verify.java`, then you could directly do `return i2;`. It would get returned as a boxed `Integer`, and should be compared that way, without allocating an `int[]` and `Object[]`. Now I only check ?|?, so they can not be merged if using '+'. But I think it can be extended. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000530238 From duke at openjdk.org Tue Mar 18 09:17:12 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 18 Mar 2025 09:17:12 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: <8pcR6tQ3Zv8FRCLRxaG57NuZlVDB4LD9mCSKgHmlKEs=.a04c40ed-4d80-4eea-a573-abb3446d1ab9@github.com> On Tue, 18 Mar 2025 08:52:09 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > src/hotspot/share/opto/memnode.cpp line 1896: > >> 1894: LowToHigh, // Adjacent and first load access low address >> 1895: HighToLow, // Adjacent and first load access high address >> 1896: NotAdjacent // Not adjacent > > What happens if an `OrNode` has its inputs swapped? This can happen if the node idx are the "wrong way around". See `commute`, comment `Otherwise, sort inputs (commutativity) to help value numbering`. > > I don't know how likely this is to happen. What do you think? I think it's ok to swap. I collected merged mem info and sorted them by shift value. Then check the memory order. So if shift order follows memory access order (or reverse), they can be merged. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000553715 From duke at openjdk.org Tue Mar 18 09:21:08 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 18 Mar 2025 09:21:08 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 09:00:42 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > src/hotspot/share/opto/memnode.cpp line 2522: > >> 2520: MergePrimitiveLoads merge(phase, this); >> 2521: Node* merged = merge.run(); >> 2522: if (merged != nullptr) { return merged; } > > I'm a little confused here... So imagine we have a `LoadB` here. How can we now return a `LoadI` instead, and replace all uses of the `LoadB` with the `LoadI`? Should we not be replacing the `OrI` instead? I'm not clear about it. I think we need replace OrI node here, but return the origin LoadB? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000560894 From epeter at openjdk.org Tue Mar 18 09:33:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 09:33:13 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 09:27:20 GMT, kuaiwei wrote: >> src/hotspot/share/opto/memnode.cpp line 1902: >> >>> 1900: PhaseGVN* const _phase; >>> 1901: LoadNode* const _load; >>> 1902: int _last_op_index; // Index of the last item in merged_list >> >> What is the `merged_list`? I could not find it. > > It's defined in MergePrimitiveLoads::run . It looks unnecessary since I have _last_op in MergeLoadInfo. Ah, I think it is a typo, it is `merge_list`. >> src/hotspot/share/opto/memnode.cpp line 2522: >> >>> 2520: MergePrimitiveLoads merge(phase, this); >>> 2521: Node* merged = merge.run(); >>> 2522: if (merged != nullptr) { return merged; } >> >> I'm a little confused here... So imagine we have a `LoadB` here. How can we now return a `LoadI` instead, and replace all uses of the `LoadB` with the `LoadI`? Should we not be replacing the `OrI` instead? > > I'm not clear about it. I think we need replace OrI node here, but return the origin LoadB? Can you please elaborate? I'm not understanding what you are saying. > Now there's limit to merge 2 LoadI as LoadL. Exactly, and that is fine. But you need a test where you merge two ints ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000580122 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000572888 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000582314 From duke at openjdk.org Tue Mar 18 09:33:12 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 18 Mar 2025 09:33:12 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:53:02 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > src/hotspot/share/opto/memnode.cpp line 1902: > >> 1900: PhaseGVN* const _phase; >> 1901: LoadNode* const _load; >> 1902: int _last_op_index; // Index of the last item in merged_list > > What is the `merged_list`? I could not find it. It's defined in MergePrimitiveLoads::run . It looks unnecessary since I have _last_op in MergeLoadInfo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000576326 From duke at openjdk.org Tue Mar 18 09:38:10 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 18 Mar 2025 09:38:10 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:48:20 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > src/hotspot/share/opto/memnode.cpp line 1862: > >> 1860: /* >> 1861: * LoadNode and OrNode pair which represent an item for merging, >> 1862: * And we can get some properties like shift and last_op from it. > > To me it seems that `load` and `shift` are the relevant elements here, and the `or` and `last_op` are the "side-info", only used if it is the last op. Is that correct? The `or` can be used to determine 2 load item are reachable in the `or` chain. After loop unrolling, the code may be like: ((a[0] & 0xff) << 24) | ((a[1] & 0xff) << 16 |((a[2] & 0xff) << 8) | (a[3] & 0xff); ((a[4] & 0xff) << 24) | ((a[5] & 0xff) << 16 |((a[6] & 0xff) << 8) | (a[7] & 0xff); ... For a given shift value, we may find multiple (load, or) items. We can check reachable of `or` nodes to know which one is relevant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000591716 From epeter at openjdk.org Tue Mar 18 09:39:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 09:39:01 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v3] In-Reply-To: References: Message-ID: > With the Template Framework, I'm generating IR tests randomly. But random code can always hit bailouts in compilation, and make code not compilable any more. We should have a way to disable this check, and just gracefully continue to execute the tests. > > To allow a single test method to be `not compilable`: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L160-L161 > > To allow all test methods to be `not compilable`: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L140-L144 > > See also this documentation in the code: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/compiler/lib/ir_framework/Test.java#L88-L94 > > --------------------------------------- > > **Backrgound** > > My random code seems to hit a bailout in the Register Allocator, and I cannot do much to predict if that bailout happens. > See https://bugs.openjdk.org/browse/JDK-8304328 Emanuel Peter has updated the pull request incrementally with three additional commits since the last revision: - Merge branch 'JDK-8351952-IR-Framework-not-compilable' of https://github.com/eme64/jdk into JDK-8351952-IR-Framework-not-compilable - Update test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java Co-authored-by: Christian Hagedorn - more for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24049/files - new: https://git.openjdk.org/jdk/pull/24049/files/e98dd89a..f3bdb54f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24049&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24049&range=01-02 Stats: 26 lines in 5 files changed: 11 ins; 8 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24049.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24049/head:pull/24049 PR: https://git.openjdk.org/jdk/pull/24049 From epeter at openjdk.org Tue Mar 18 09:39:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 09:39:01 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v3] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 20:47:10 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with three additional commits since the last revision: >> >> - Merge branch 'JDK-8351952-IR-Framework-not-compilable' of https://github.com/eme64/jdk into JDK-8351952-IR-Framework-not-compilable >> - Update test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java >> >> Co-authored-by: Christian Hagedorn >> - more for Christian > > Some first comments, will continue tomorrow :-) @chhagedorn Thanks for your second pass! I addressed all you comments :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24049#issuecomment-2732314204 From epeter at openjdk.org Tue Mar 18 09:39:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 09:39:05 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v2] In-Reply-To: References: <9OtjMzuRRR6XQV2puQbP_rfhzLYI0lCgaJtaGqbpcpk=.c4b10cdc-0afd-40a4-a916-c1b1be2bf2e1@github.com> Message-ID: <-XyRNGT6Ogeh07RUJL3apgZY9cGNA0t39DXn3JdNJkY=.215447ea-0fae-4fe5-b25c-d2506959d427@github.com> On Tue, 18 Mar 2025 08:40:53 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> documentation from Christian > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 418: > >> 416: * true, which allows any test to pass even if there is a compilation bailout. If only selected methods are prone >> 417: * to bail out, it is preferred to use {@link Test#allowNotCompilable()} instead for more fine-grained control. >> 418: * By setting this flag, any associated {@link IR} rule of a test is only executed if the test method was compiled, > > Whitespace error. fixed > test/hotspot/jtreg/compiler/lib/ir_framework/test/AbstractTest.java line 122: > >> 120: tryCompileMethod(test); >> 121: } catch (MethodNotCompilableException e) { >> 122: TestRun.check(test.isAllowNotCompilable(), "Only allowNotCompilable methods should throw MethodNotCompilableException."); > > Should we emit a log message here in case we have an expected compilation bailout here? done! > test/hotspot/jtreg/compiler/lib/ir_framework/test/AbstractTest.java line 175: > >> 173: } >> 174: TestRun.check(WHITE_BOX.isMethodCompilable(testMethod, test.getCompLevel().getValue(), false), >> 175: "Method " + testMethod + " not compilable (anymore) at level " + test.getCompLevel()); > > You now check `isMethodCompilable()` twice. You could move the second check inside the `catch` block for `MethodNotCompilableException`. Then you can use this exception whenever there is a method that is not compilable (as the class name suggests) and then only do the separation between allowed and disallowed cases there. done! > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java line 102: > >> 100: throw new RuntimeException("should have thrown TestRunException"); >> 101: } catch (TestVMException e) { >> 102: } catch (IRViolationException e) {} > > This is a mismatch. Which exceptions are now expected? Same below. Adjusted and added comment :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000590741 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000590141 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000590272 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000590601 From epeter at openjdk.org Tue Mar 18 09:49:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 09:49:09 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: <8pcR6tQ3Zv8FRCLRxaG57NuZlVDB4LD9mCSKgHmlKEs=.a04c40ed-4d80-4eea-a573-abb3446d1ab9@github.com> References: <8pcR6tQ3Zv8FRCLRxaG57NuZlVDB4LD9mCSKgHmlKEs=.a04c40ed-4d80-4eea-a573-abb3446d1ab9@github.com> Message-ID: On Tue, 18 Mar 2025 09:14:48 GMT, kuaiwei wrote: >> src/hotspot/share/opto/memnode.cpp line 1896: >> >>> 1894: LowToHigh, // Adjacent and first load access low address >>> 1895: HighToLow, // Adjacent and first load access high address >>> 1896: NotAdjacent // Not adjacent >> >> What happens if an `OrNode` has its inputs swapped? This can happen if the node idx are the "wrong way around". See `commute`, comment `Otherwise, sort inputs (commutativity) to help value numbering`. >> >> I don't know how likely this is to happen. What do you think? > > I think it's ok to swap. I collected merged mem info and sorted them by shift value. Then check the memory order. So if shift order follows memory access order (or reverse), they can be merged. Ah nice! That means you could add some tests where the order is shuffled, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000611346 From epeter at openjdk.org Tue Mar 18 10:04:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 10:04:00 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v4] In-Reply-To: References: Message-ID: > With the Template Framework, I'm generating IR tests randomly. But random code can always hit bailouts in compilation, and make code not compilable any more. We should have a way to disable this check, and just gracefully continue to execute the tests. > > To allow a single test method to be `not compilable`: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L160-L161 > > To allow all test methods to be `not compilable`: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L140-L144 > > See also this documentation in the code: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/compiler/lib/ir_framework/Test.java#L88-L94 > > --------------------------------------- > > **Backrgound** > > My random code seems to hit a bailout in the Register Allocator, and I cannot do much to predict if that bailout happens. > See https://bugs.openjdk.org/browse/JDK-8304328 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24049/files - new: https://git.openjdk.org/jdk/pull/24049/files/f3bdb54f..259a476c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24049&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24049&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24049.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24049/head:pull/24049 PR: https://git.openjdk.org/jdk/pull/24049 From chagedorn at openjdk.org Tue Mar 18 10:04:02 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 10:04:02 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v3] In-Reply-To: References: Message-ID: <2HreqpEbfZW6Zmgg_hJFzAi-lNI46tZI5kmkaLKJfyY=.e4a2fd7c-a197-4760-bec3-2c7fbea7fc32@github.com> On Tue, 18 Mar 2025 09:39:01 GMT, Emanuel Peter wrote: >> With the Template Framework, I'm generating IR tests randomly. But random code can always hit bailouts in compilation, and make code not compilable any more. We should have a way to disable this check, and just gracefully continue to execute the tests. >> >> To allow a single test method to be `not compilable`: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L160-L161 >> >> To allow all test methods to be `not compilable`: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L140-L144 >> >> See also this documentation in the code: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/compiler/lib/ir_framework/Test.java#L88-L94 >> >> --------------------------------------- >> >> **Backrgound** >> >> My random code seems to hit a bailout in the Register Allocator, and I cannot do much to predict if that bailout happens. >> See https://bugs.openjdk.org/browse/JDK-8304328 > > Emanuel Peter has updated the pull request incrementally with three additional commits since the last revision: > > - Merge branch 'JDK-8351952-IR-Framework-not-compilable' of https://github.com/eme64/jdk into JDK-8351952-IR-Framework-not-compilable > - Update test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java > > Co-authored-by: Christian Hagedorn > - more for Christian Some small test comments, otherwise, it looks good now. Thanks for the updates! Maybe to share some background about our offline discussions: Conceptionally, it would be better if the test VM will adjust the IR encoding when a method is not compilable and it is expected. Then the driver VM does not even need to know about that case and can apply IR matching as it would normally do. However, that turned out to be not very easy to implement. It would require some more extensive refactorings which are out of scope. We might still want to address them at some point but that should be done separately. We decided it's the easiest way to handle it in IR matching for now. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java line 100: > 98: try { > 99: framework.start(); > 100: throw new RuntimeException("should have thrown TestRunException or IRViolationException"); To be more explicit since `TestRunException` is only thrown in the test VM which is then noticed by the driver VM and rethrown with a `TestVMException`: Suggestion: throw new RuntimeException("should have thrown TestRunException/TestVMException or IRViolationException"); test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java line 102: > 100: throw new RuntimeException("should have thrown TestRunException or IRViolationException"); > 101: } catch (TestVMException e) { > 102: // Happens when we hit the issue during explicit compilabion by the Framework. Suggestion: // Happens when we hit the issue during explicit compilation by the Framework. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java line 123: > 121: try { > 122: framework.start(); > 123: throw new RuntimeException("should have thrown TestRunException"); Same here: Suggestion: throw new RuntimeException("should have thrown TestRunException/TestVMException"); ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24049#pullrequestreview-2693718687 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000620528 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000613990 PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2000622191 From duke at openjdk.org Tue Mar 18 10:33:07 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 18 Mar 2025 10:33:07 GMT Subject: RFR: 8335708: C2: assert(!dead_nodes) failed: using nodes must be reachable from root In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 06:50:29 GMT, Emanuel Peter wrote: > you could also consider changing the PR name. Maybe something like graph verification must start at root and safepoints, just like CCP traversal. Maybe you have an even better idea ;) I don't understand what you mean. the title of the PR must match the JBS ticket. I have very little creative freedom there. No? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23977#issuecomment-2732584618 From thartmann at openjdk.org Tue Mar 18 10:36:14 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 18 Mar 2025 10:36:14 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 10:04:00 GMT, Emanuel Peter wrote: >> With the Template Framework, I'm generating IR tests randomly. But random code can always hit bailouts in compilation, and make code not compilable any more. We should have a way to disable this check, and just gracefully continue to execute the tests. >> >> To allow a single test method to be `not compilable`: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L160-L161 >> >> To allow all test methods to be `not compilable`: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L140-L144 >> >> See also this documentation in the code: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/compiler/lib/ir_framework/Test.java#L88-L94 >> >> --------------------------------------- >> >> **Backrgound** >> >> My random code seems to hit a bailout in the Register Allocator, and I cannot do much to predict if that bailout happens. >> See https://bugs.openjdk.org/browse/JDK-8304328 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24049#pullrequestreview-2693889653 From epeter at openjdk.org Tue Mar 18 10:40:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 10:40:12 GMT Subject: RFR: 8335708: C2: assert(!dead_nodes) failed: using nodes must be reachable from root In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 10:30:24 GMT, Marc Chevalier wrote: > > you could also consider changing the PR name. Maybe something like graph verification must start at root and safepoints, just like CCP traversal. Maybe you have an even better idea ;) > > I don't understand what you mean. the title of the PR must match the JBS ticket. I have very little creative freedom there. No? You can always change both, so that they match ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23977#issuecomment-2732610648 From mli at openjdk.org Tue Mar 18 10:47:21 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Mar 2025 10:47:21 GMT Subject: RFR: 8320997: RISC-V: C2 ReverseV Message-ID: Hi, Can you help to review this patch to implement ReverseV? Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24096/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24096&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320997 Stats: 30 lines in 2 files changed: 29 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24096/head:pull/24096 PR: https://git.openjdk.org/jdk/pull/24096 From mli at openjdk.org Tue Mar 18 10:50:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Mar 2025 10:50:08 GMT Subject: RFR: 8352022: RISC-V: Support Zfa fminm_h/fmaxm_h for float16 In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 06:57:29 GMT, Anjian-Wen wrote: > For the support of float16, add the Zfa fminm/fmaxm with the type of float16 > this related to https://bugs.openjdk.org/browse/JDK-8345298 Looks good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24047#pullrequestreview-2693955787 From duke at openjdk.org Tue Mar 18 11:04:09 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 18 Mar 2025 11:04:09 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 09:25:36 GMT, Emanuel Peter wrote: >> I'm not clear about it. I think we need replace OrI node here, but return the origin LoadB? > > Can you please elaborate? I'm not understanding what you are saying. For `LoadB` nodes, I will check they have unique usage to `OrI`. If they can be merged, the merged `LoadI` will replace the last `OrI` node, and all `LoadB` and `OrI` nodes will be dead code. I'm not sure if I can return the merged `LoadI` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000777557 From duke at openjdk.org Tue Mar 18 11:04:15 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 18 Mar 2025 11:04:15 GMT Subject: Integrated: 8352022: RISC-V: Support Zfa fminm_h/fmaxm_h for float16 In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 06:57:29 GMT, Anjian-Wen wrote: > For the support of float16, add the Zfa fminm/fmaxm with the type of float16 > this related to https://bugs.openjdk.org/browse/JDK-8345298 This pull request has now been integrated. Changeset: b891bfa7 Author: Anjian-Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/b891bfa7e67c21478475642e2bfa2cdc65a3bffe Stats: 41 lines in 2 files changed: 41 ins; 0 del; 0 mod 8352022: RISC-V: Support Zfa fminm_h/fmaxm_h for float16 Reviewed-by: fyang, mli ------------- PR: https://git.openjdk.org/jdk/pull/24047 From mli at openjdk.org Tue Mar 18 12:20:11 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Mar 2025 12:20:11 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:31:16 GMT, Robbin Ehn wrote: > I think someone took a short-cut and added flagless vm. The correct requires should be: > > ``` > - * @requires vm.flagless > + * @requires vm.compiler2.enabled & vm.opt.final.UseMD5Intrinsics == true > ``` Seems not, `UseMD5Intrinsics` should be the vm flag to be verified, rather than passed in as a vm flag. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24027#issuecomment-2733011929 From mli at openjdk.org Tue Mar 18 12:24:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Mar 2025 12:24:06 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:57:48 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > These client tests seems not that useful to me, so the simple solution could be just disable them on riscv. > > Thanks! I think these tests are to verify the crypto intrinsics to be enable based on only CPU features. For these failed tests, they depends on no cpu feature, but `AvoidUnalignedAccesses`, so we should skip them. For the tests such as `TestUseSHA256IntrinsicsOptionOnSupportedCPU` and `TestUseSHA512IntrinsicsOptionOnSupportedCPU`, they depends on CPU feature `zvkn`, so they run successfully. We could modify the test framework to support passing `AvoidUnalignedAccesses` as `unsupportedCPUFeatures`, but seems to me it's not worth to do so, something like below: --- a/test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java +++ b/test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java @@ -61,7 +61,7 @@ public class IntrinsicPredicates { public static final BooleanSupplier MD5_INSTRUCTION_AVAILABLE = new OrPredicate(new CPUSpecificPredicate("aarch64.*", null, null), - new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, null), + new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, new String[] { "AvoidUnalignedAccesses" }), // x86 variants new OrPredicate(new CPUSpecificPredicate("amd64.*", null, null), new OrPredicate(new CPUSpecificPredicate("i386.*", null, null), @@ -70,7 +70,7 @@ public class IntrinsicPredicates { public static final BooleanSupplier SHA1_INSTRUCTION_AVAILABLE = new OrPredicate(new CPUSpecificPredicate("aarch64.*", new String[] { "sha1" }, null), // SHA-1 intrinsic is implemented with scalar instructions on riscv64 - new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, null), + new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, new String[] { "AvoidUnalignedAccesses" }), new OrPredicate(new CPUSpecificPredicate("s390.*", new String[] { "sha1" }, null), // x86 variants new OrPredicate(new CPUSpecificPredicate("amd64.*", new String[] { "sha" }, null), So, I'll add some comment about why we disable these tests on riscv. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24027#issuecomment-2733028463 From rcastanedalo at openjdk.org Tue Mar 18 12:34:39 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 18 Mar 2025 12:34:39 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v3] In-Reply-To: References: Message-ID: > The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: > > > for (int i = 0; i < array.length; i++) { > array[i] = 0; > } > > with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: > > > arrayof_jint_fill(array, 0, array.length) > > > The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). > > This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. A more general solution to this issue, and a way to prevent similar bugs in the future, would be to define a `StoreS` node returning the appropriate `memory_type()`. I propose to investigate this in a separate RFE and keep this fix as minimal and non-intrusive as possible for backportability. > > **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Explicitly disable optimization for mismatching stores; add positive and negative tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24005/files - new: https://git.openjdk.org/jdk/pull/24005/files/90fd7660..38c9b475 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24005&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24005&range=01-02 Stats: 261 lines in 2 files changed: 259 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24005/head:pull/24005 PR: https://git.openjdk.org/jdk/pull/24005 From mli at openjdk.org Tue Mar 18 12:36:50 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Mar 2025 12:36:50 GMT Subject: RFR: 8352248: Check if CMoveX is supported Message-ID: Hi, Can you help to review this patch? Currenlty, seems CMoveX are fully supported on most platforms, except of riscv64. On riscv64, there is no efficient way to implement CMoveF/D as other CMoveX (e.g. CMoveI), but it will still bring benefit by just supporting CMoveX without CMoveF/D. This patch is to supply such option. As other platforms already supported CMoveX, this patch should not impact them, as `!CMoveNode::supported(_igvn.type(phi))` should always be false. BTW, in a subsequent pr for riscv, I'll implement CMoveX except of CMoveF/D, and also return false for CMoveF/D in Matcher::match_rule_supported. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24095/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24095&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352248 Stats: 17 lines in 3 files changed: 16 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24095/head:pull/24095 PR: https://git.openjdk.org/jdk/pull/24095 From mli at openjdk.org Tue Mar 18 12:37:02 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Mar 2025 12:37:02 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this simple patch? > These client tests seems not that useful to me, so the simple solution could be just disable them on riscv. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24027/files - new: https://git.openjdk.org/jdk/pull/24027/files/19b55777..4a5a0516 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24027&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24027&range=00-01 Stats: 3 lines in 3 files changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24027.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24027/head:pull/24027 PR: https://git.openjdk.org/jdk/pull/24027 From rcastanedalo at openjdk.org Tue Mar 18 12:43:14 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 18 Mar 2025 12:43:14 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: <4970lfg_SYiaN_khYDdnxKE2_6UgCw0n0T8uG-LZ2n0=.618bffee-9db9-483a-ba13-55992d7d9374@github.com> References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> <4970lfg_SYiaN_khYDdnxKE2_6UgCw0n0T8uG-LZ2n0=.618bffee-9db9-483a-ba13-55992d7d9374@github.com> Message-ID: On Mon, 17 Mar 2025 09:44:09 GMT, Roberto Casta?eda Lozano wrote: > > The alternative of using `memory_type()` and introducing a `StoreS` node assumes for correctness that the array fill optimization does not succeed for mismatched stores such as those you mention (e.g. `StoreS` into a `char[]`). > > After some more thought, I lean towards just disabling the `OptimizeFill` optimization for mismatched stores. It does not succeed today anyway due to accidental reasons (brittleness in pattern matching), so disabling it for this case should not have any other impact than making us more confident in the correctness of the optimization. Done now, and also added a set of positive and negative test cases (commit 38c9b475) and updated the PR description. @merykitty hopefully this addresses your concerns, please let me know what you think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2733076258 From duke at openjdk.org Tue Mar 18 13:07:10 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 18 Mar 2025 13:07:10 GMT Subject: RFR: 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 06:41:37 GMT, Emanuel Peter wrote: >> In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: >> https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 >> >> Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. >> >> About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. >> >> And thanks to @eme64 who extracted such a concise reproducer. > > test/hotspot/jtreg/compiler/loopopts/Test8335708.java line 29: > >> 27: * @summary Crash Compile::verify_graph_edges >> 28: * @requires vm.debug == true & vm.flavor == "server" >> 29: * @library /test/lib > > Suggestion: > > > Can you test if you actually need this line? I mach5'ed it. Without it, the `import jdk.test.lib.Utils;` seems to be failing, and the subsequent `Utils.adjustTimeout(500)` doesn't know about `Utils`. Re-introduced it and it works. So it seems, I need this line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r2001001385 From roland at openjdk.org Tue Mar 18 13:49:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 18 Mar 2025 13:49:10 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v2] In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Fri, 14 Mar 2025 10:32:01 GMT, Christian Hagedorn wrote: >> This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. >> >> ### Unified Logic to Clean Up Parse and Template Assertion Predicates >> We now use the following algorithm: >> https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 >> >> This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: >> >> https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 >> >> We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. >> >> #### Ignoring Useless Parse Predicates >> While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. >> >> #### Sharing Code for Parse and Template Assertion Predicates >> With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: >> >> - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. >> - Adding accessor methods to get the Predicate lists from `Compile`. >> - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates >> >> Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. >> >> ### Other Included Changes >> - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could ... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Introduce predicates_enums.hpp Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24013#pullrequestreview-2694628566 From duke at openjdk.org Tue Mar 18 13:52:18 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 18 Mar 2025 13:52:18 GMT Subject: RFR: 8314999: IR framework fails to detect allocation Message-ID: Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: "precise .*" + IS_REPLACED + ":" which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: "precise .*\\b" + IS_REPLACED + ":" which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. It is not quite fool-proof since a package path can still be extended, e.g. @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) will also match allocations of `a/prefix/some/package/MyClass`. I think it's an acceptable limitation. Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: 140 Allocate === 120 117 137 8 1 (93 138 23 1 1 10 43 43 10 43 ) [[ 141 142 143 150 151 152 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:13 (line 41) Test::test @ bci:5 (line 46) !jvms: Test$MyClass:: @ bci:13 (line 41) Test::test @ bci:5 (line 46) and now it is 140 Allocate === 120 117 137 8 1 (93 138 23 1 1 10 43 43 10 43 ) [[ 141 142 143 150 151 152 ]] precise java/util/HashSet: 0x00007fe2244ccd28 (java/lang/Cloneable,java/io/Serializable,java/lang/Iterable,java/util/Collection,java/util/Set):Constant:exact * rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:13 (line 41) Test::test @ bci:5 (line 46) !jvms: Test$MyClass:: @ bci:13 (line 41) Test::test @ bci:5 (line 46) The `precise java/util/HashSet: 0x00007fe2244ccd28 (java/lang/Cloneable,java/io/Serializable,java/lang/Iterable,java/util/Collection,java/util/Set):Constant:exact *` part is new. It is meant to be like what we can see in `ConP`. ------------- Commit messages: - IR ALLOC before macro expansion Changes: https://git.openjdk.org/jdk/pull/24093/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24093&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314999 Stats: 70 lines in 6 files changed: 29 ins; 13 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/24093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24093/head:pull/24093 PR: https://git.openjdk.org/jdk/pull/24093 From duke at openjdk.org Tue Mar 18 13:57:33 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 18 Mar 2025 13:57:33 GMT Subject: RFR: 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal [v2] In-Reply-To: References: Message-ID: > In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: > https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 > > Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. > > About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. > > And thanks to @eme64 who extracted such a concise reproducer. Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: various fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23977/files - new: https://git.openjdk.org/jdk/pull/23977/files/a07dc231..6f8fce6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23977&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23977&range=00-01 Stats: 157 lines in 3 files changed: 81 ins; 69 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23977.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23977/head:pull/23977 PR: https://git.openjdk.org/jdk/pull/23977 From duke at openjdk.org Tue Mar 18 13:57:34 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 18 Mar 2025 13:57:34 GMT Subject: RFR: 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal [v2] In-Reply-To: References: Message-ID: <-tZii0FQJQOO_LXPAFddZcM1tNltiXCAVY9autC8064=.48e05b84-f1d6-41da-91e2-78269ffaa404@github.com> On Mon, 17 Mar 2025 06:40:38 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> various fixes > > test/hotspot/jtreg/compiler/loopopts/Test8335708.java line 1: > >> 1: /* > > Can you rename the test file? I think the new common practice is to give it a descriptive name, rather than just the bug number which is already tracked under `@bug 8335708` anyway ;) Done. I think it's more explicit. > test/hotspot/jtreg/compiler/loopopts/Test8335708.java line 28: > >> 26: * @bug 8335708 >> 27: * @summary Crash Compile::verify_graph_edges >> 28: * @requires vm.debug == true & vm.flavor == "server" > > Can we find a way not to have this restriction? It could make sense to still execute this in product, or with other compilers. > > If the issues is with vm flags, then you can always use `-XX:+IgnoreUnrecognizedVMOptions`. Done. > test/hotspot/jtreg/compiler/loopopts/Test8335708.java line 35: > >> 33: * -XX:+StressIGVN -Xcomp >> 34: * -XX:CompileCommand=compileonly,compiler.loopopts.Test8335708::mainTest >> 35: * compiler.loopopts.Test8335708 > > Can you please add a run without any flags? Sometimes that allows other bugs to trigger, because it can then be used without any flags, or other flag combinations. Done. I hope it's the right way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r2001109017 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r2001106171 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r2001107606 From duke at openjdk.org Tue Mar 18 14:02:14 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 18 Mar 2025 14:02:14 GMT Subject: RFR: 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 06:47:30 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/compile.cpp line 4206: >> >>> 4204: uint stack_size = live_nodes() >> 4; >>> 4205: Node_List nstack(MAX2(stack_size, (uint) OptoNodeListSize)); >>> 4206: if (root_and_safepoints != nullptr) { >> >> Can you say in which cases we don't have `root_and_safepoints`? Why is it ok not to also start at SafePoint in those cases? > > I think you should also say that we start the traversal from Root and Safepoints, just like during CCP. I've improved the comment on the declaration of `verify_graph_edges`: it is the only caller of `verify_bidirectional_edges`, which acts more like a helper, and `verify_graph_edges` is the one called a bit everywhere. Also, I think it's not an implementation detail, but a signature/contract thing: when writing a call to `verify_graph_edges`, I must know what I need to provide in `root_and_safepoints`, or when I can omit it. So now, I hope it's documented. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r2001122257 From epeter at openjdk.org Tue Mar 18 14:21:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 14:21:10 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v2] In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Fri, 14 Mar 2025 10:32:01 GMT, Christian Hagedorn wrote: >> This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. >> >> ### Unified Logic to Clean Up Parse and Template Assertion Predicates >> We now use the following algorithm: >> https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 >> >> This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: >> >> https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 >> >> We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. >> >> #### Ignoring Useless Parse Predicates >> While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. >> >> #### Sharing Code for Parse and Template Assertion Predicates >> With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: >> >> - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. >> - Adding accessor methods to get the Predicate lists from `Compile`. >> - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates >> >> Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. >> >> ### Other Included Changes >> - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could ... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Introduce predicates_enums.hpp Generally looks good, I only gave it a quick scan. I'm wondering if there is a naming inconsistency, see below ;) src/hotspot/share/opto/predicates.cpp line 1190: > 1188: } > 1189: mark_non_useful_predicates_useless(); > 1190: } I would replace `non_useful` to `maybe_useful` to keep it consistent with the comments above. src/hotspot/share/opto/predicates.cpp line 1195: > 1193: mark_predicates_on_list_maybe_useful(_parse_predicates); > 1194: mark_predicates_on_list_maybe_useful(_template_assertion_predicate_opaques); > 1195: } Suggestion: void EliminateUselessPredicates::mark_all_predicates_maybe_useful() const { mark_predicates_on_list_maybe_useful(_parse_predicates); mark_predicates_on_list_maybe_useful(_template_assertion_predicate_opaques); } ------------- PR Review: https://git.openjdk.org/jdk/pull/24013#pullrequestreview-2694755830 PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r2001163739 PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r2001160291 From epeter at openjdk.org Tue Mar 18 14:21:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 14:21:10 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v2] In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Tue, 18 Mar 2025 14:14:40 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Introduce predicates_enums.hpp > > src/hotspot/share/opto/predicates.cpp line 1195: > >> 1193: mark_predicates_on_list_maybe_useful(_parse_predicates); >> 1194: mark_predicates_on_list_maybe_useful(_template_assertion_predicate_opaques); >> 1195: } > > Suggestion: > > void EliminateUselessPredicates::mark_all_predicates_maybe_useful() const { > mark_predicates_on_list_maybe_useful(_parse_predicates); > mark_predicates_on_list_maybe_useful(_template_assertion_predicate_opaques); > } For consistency? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r2001160628 From shade at openjdk.org Tue Mar 18 14:29:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Mar 2025 14:29:45 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks Message-ID: [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleuth-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. The code tries to switch weak JNI handle with a strong one when it wants to capture the holder. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. Additional testing: - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code - [x] Linux x86_64 server fastdebug, `all` - [x] Linux AArch64 server fastdebug, `all` ------------- Commit messages: - JNIHandles -> VM(Weak) Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8231269 Stats: 78 lines in 2 files changed: 53 ins; 8 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Tue Mar 18 14:29:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Mar 2025 14:29:45 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 19:45:41 GMT, Aleksey Shipilev wrote: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleuth-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` You can model the impact it has on Leyden-style scenarios by producing many compile tasks with `-Xcomp`. For example, on my 5950X desktop and simple "Hello World" program that involves lots of javac compilation: Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xcomp -XX:TieredStopAtLevel=1 Hello.java # Before Time (mean ? ?): 1.944 s ? 0.011 s [User: 1.822 s, System: 0.154 s] Range (min ? max): 1.924 s ? 1.956 s 10 runs # After Time (mean ? ?): 1.914 s ? 0.008 s [User: 1.794 s, System: 0.151 s] Range (min ? max): 1.900 s ? 1.923 s 10 runs The effect is mostly due to avoiding `OopStorage` locks mentioned in PR body. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2729534004 From chagedorn at openjdk.org Tue Mar 18 14:51:10 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 14:51:10 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: <2OJ3-DbdBBiHjZKVDGNXb-IFLqXi0jI4Ezz_zs6NsMA=.a904b56a-a3c4-4f06-a708-70223beccb60@github.com> Message-ID: On Fri, 14 Mar 2025 13:33:31 GMT, Roland Westrelin wrote: > Capturing i >= 0 in the loop Phi or array address CastII or ConvI2L then enables better use of address modes on x86. That's very promising. > Except, narrowing the type of the Phi or CastII expose sC2 to the exact bug this PR tries to fix: what if the loop becomes unreachable but C2 can't fold it away and the Phi or CastII end up having an out of range input? But also a problem, indeed. I just think that going into the future, we should still make a reasonable effort to try and let the control path die sanely without needing this patch. It should only serve as a last resort to avoid breaking the graph. While I think it's the safest solution, my concern is that we will not find inefficiencies anymore with this patch. For example, if someone breaks Assertion Predicates, how can we detect this when the graph will always be sane? It's especially tricky now that I'm still adding Assertion Predicate patches and things might break during development and it goes unnoticed. But maybe I just need to turn this patch off locally. A develop flag to turn this patch off could also help but then we have the problem that someone uses the flag and reports an assertion failure that is actually not a real bug because it's one of these kinds of failures we cannot fix otherwise. But I assume it's quite rare that this will happen. > For the test case that I added for this bug, the issue is that some CastII transformations widen the types of some nodes. I suppose the way to fix this would be to restrict those transformations so widening doesn't happen in some cases. It's going to be tricky (because widening happens so mostly identical CastII nodes can be commoned to improve code quality) and fragile (if to preserve performance, we choose to only restrict those transformations to few targeted cases). It sounds hard to find a control path removal fix for these kind of issues. Also for the type being zero on the div by zero failing path which lets some type nodes die and control is not because we don't have an "everything but zero" type. > For 8275202, what I tried doing is that when the new pass proves a condition constant, rather than constant fold the condition, it mark the test as always failing/succeeding (so (If (Bool ...))is transformed into(If (Opaque4 (Booland theOpaque4captures the final result of theBool. Then the Opaque4` constant folds later. I found several issues with this: Sounds interesting but as you've stated creates new problems. Summarizing my thoughts: - I'm in favor of this patch as a last resort solution for these seemingly unsolvable cases and all future problems. - The unsolvable cases with control not being folded should be documented somewhere - maybe they can be solved in the future. - (Maybe) being able to turn this patch off with a develop flag. That could also allow us to test patches that tackle some of these hard to solve cases at some point. - We should make sure that compilation speed is not significantly affected by doing this search on all dying `Type` nodes (maybe @robcasloz can give you some pointers here - he did some compilation time measurements before). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2733529037 From rehn at openjdk.org Tue Mar 18 14:52:09 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 18 Mar 2025 14:52:09 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 [v2] In-Reply-To: References: Message-ID: <_AziS-ZaYz3I-9526uk-8vFIwsiNZ8XYlBx_cRCB79c=.a12d92cb-ce66-474e-917f-349f27a468b9@github.com> On Tue, 18 Mar 2025 12:37:02 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> These client tests seems not that useful to me, so the simple solution could be just disable them on riscv. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comments Marked as reviewed by rehn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24027#pullrequestreview-2694880143 From rehn at openjdk.org Tue Mar 18 14:52:10 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 18 Mar 2025 14:52:10 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 12:22:00 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> These client tests seems not that useful to me, so the simple solution could be just disable them on riscv. >> >> Thanks! > > I think these tests are to verify the crypto intrinsics to be enable based on only CPU features. > > For these failed tests, they depends on no cpu feature, but `AvoidUnalignedAccesses`, so we should skip them. > For the tests such as `TestUseSHA256IntrinsicsOptionOnSupportedCPU` and `TestUseSHA512IntrinsicsOptionOnSupportedCPU`, they depends on CPU feature `zvkn`, so they run successfully. > > We could modify the test framework to support passing `AvoidUnalignedAccesses` as `unsupportedCPUFeatures`, but seems to me it's not worth to do so, something like below: > > --- a/test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java > +++ b/test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java > @@ -61,7 +61,7 @@ public class IntrinsicPredicates { > > public static final BooleanSupplier MD5_INSTRUCTION_AVAILABLE > = new OrPredicate(new CPUSpecificPredicate("aarch64.*", null, null), > - new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, null), > + new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, new String[] { "AvoidUnalignedAccesses" }), > // x86 variants > new OrPredicate(new CPUSpecificPredicate("amd64.*", null, null), > new OrPredicate(new CPUSpecificPredicate("i386.*", null, null), > @@ -70,7 +70,7 @@ public class IntrinsicPredicates { > public static final BooleanSupplier SHA1_INSTRUCTION_AVAILABLE > = new OrPredicate(new CPUSpecificPredicate("aarch64.*", new String[] { "sha1" }, null), > // SHA-1 intrinsic is implemented with scalar instructions on riscv64 > - new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, null), > + new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, new String[] { "AvoidUnalignedAccesses" }), > new OrPredicate(new CPUSpecificPredicate("s390.*", new String[] { "sha1" }, null), > // x86 variants > new OrPredicate(new CPUSpecificPredicate("amd64.*", new String[] { "sha" }, null), > > > So, I'll add some comment about why we disable these tests on riscv. Yes, I got the test wrong. Thank @Hamlin-Li explaining it to me. I'm fine with just disabling it, as we basically need to predicate on AvoidUnalignedAccesses. Which in its turn is a predicate for UseMD5Instrinsic. So did we even test anything ? That is not clear to me and if there was something tested it seems to be of very little value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24027#issuecomment-2733524844 From mli at openjdk.org Tue Mar 18 14:52:11 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 18 Mar 2025 14:52:11 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 12:22:00 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> These client tests seems not that useful to me, so the simple solution could be just disable them on riscv. >> >> Thanks! > > I think these tests are to verify the crypto intrinsics to be enable based on only CPU features. > > For these failed tests, they depends on no cpu feature, but `AvoidUnalignedAccesses`, so we should skip them. > For the tests such as `TestUseSHA256IntrinsicsOptionOnSupportedCPU` and `TestUseSHA512IntrinsicsOptionOnSupportedCPU`, they depends on CPU feature `zvkn`, so they run successfully. > > We could modify the test framework to support passing `AvoidUnalignedAccesses` as `unsupportedCPUFeatures`, but seems to me it's not worth to do so, something like below: > > --- a/test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java > +++ b/test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java > @@ -61,7 +61,7 @@ public class IntrinsicPredicates { > > public static final BooleanSupplier MD5_INSTRUCTION_AVAILABLE > = new OrPredicate(new CPUSpecificPredicate("aarch64.*", null, null), > - new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, null), > + new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, new String[] { "AvoidUnalignedAccesses" }), > // x86 variants > new OrPredicate(new CPUSpecificPredicate("amd64.*", null, null), > new OrPredicate(new CPUSpecificPredicate("i386.*", null, null), > @@ -70,7 +70,7 @@ public class IntrinsicPredicates { > public static final BooleanSupplier SHA1_INSTRUCTION_AVAILABLE > = new OrPredicate(new CPUSpecificPredicate("aarch64.*", new String[] { "sha1" }, null), > // SHA-1 intrinsic is implemented with scalar instructions on riscv64 > - new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, null), > + new OrPredicate(new CPUSpecificPredicate("riscv64.*", null, new String[] { "AvoidUnalignedAccesses" }), > new OrPredicate(new CPUSpecificPredicate("s390.*", new String[] { "sha1" }, null), > // x86 variants > new OrPredicate(new CPUSpecificPredicate("amd64.*", new String[] { "sha" }, null), > > > So, I'll add some comment about why we disable these tests on riscv. > Yes, I got the test wrong. Thank @Hamlin-Li explaining it to me. I'm fine with just disabling it, as we basically need to predicate on AvoidUnalignedAccesses. Which in its turn is a predicate for UseMD5Instrinsic. So did we even test anything ? That is not clear to me and if there was something tested it seems to be of very little value. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24027#issuecomment-2733535899 From chagedorn at openjdk.org Tue Mar 18 14:54:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 14:54:13 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v2] In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Tue, 18 Mar 2025 14:14:49 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/predicates.cpp line 1195: >> >>> 1193: mark_predicates_on_list_maybe_useful(_parse_predicates); >>> 1194: mark_predicates_on_list_maybe_useful(_template_assertion_predicate_opaques); >>> 1195: } >> >> Suggestion: >> >> void EliminateUselessPredicates::mark_all_predicates_maybe_useful() const { >> mark_predicates_on_list_maybe_useful(_parse_predicates); >> mark_predicates_on_list_maybe_useful(_template_assertion_predicate_opaques); >> } > > For consistency? Definitely an inconsistency, good catch! Will update this and the other occurrences. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24013#discussion_r2001243682 From chagedorn at openjdk.org Tue Mar 18 15:05:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 15:05:32 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v3] In-Reply-To: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: > This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. > > ### Unified Logic to Clean Up Parse and Template Assertion Predicates > We now use the following algorithm: > https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 > > This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: > > https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 > > We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. > > #### Ignoring Useless Parse Predicates > While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. > > #### Sharing Code for Parse and Template Assertion Predicates > With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: > > - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. > - Adding accessor methods to get the Predicate lists from `Compile`. > - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates > > Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. > > ### Other Included Changes > - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could hinder some optimizations. I've added the code now i... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Emanuel's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24013/files - new: https://git.openjdk.org/jdk/pull/24013/files/1508a3be..4c9f5166 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24013&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24013&range=01-02 Stats: 11 lines in 2 files changed: 1 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24013.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24013/head:pull/24013 PR: https://git.openjdk.org/jdk/pull/24013 From chagedorn at openjdk.org Tue Mar 18 15:05:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 15:05:32 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v2] In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Fri, 14 Mar 2025 10:32:01 GMT, Christian Hagedorn wrote: >> This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. >> >> ### Unified Logic to Clean Up Parse and Template Assertion Predicates >> We now use the following algorithm: >> https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 >> >> This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: >> >> https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 >> >> We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. >> >> #### Ignoring Useless Parse Predicates >> While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. >> >> #### Sharing Code for Parse and Template Assertion Predicates >> With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: >> >> - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. >> - Adding accessor methods to get the Predicate lists from `Compile`. >> - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates >> >> Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. >> >> ### Other Included Changes >> - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could ... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Introduce predicates_enums.hpp Thanks Roland and Emanuel for your reviews! I've pushed an updated with your suggestions Emanuel. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24013#issuecomment-2733575867 From chagedorn at openjdk.org Tue Mar 18 15:17:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 15:17:12 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v4] In-Reply-To: References: Message-ID: <9_FoKM18b5ywtJq7zwxugWHL-_trD_0bGl1VWwo6SI0=.bffb4a01-1b7f-4b27-a77c-b7d9ba17b588@github.com> On Tue, 18 Mar 2025 10:04:00 GMT, Emanuel Peter wrote: >> With the Template Framework, I'm generating IR tests randomly. But random code can always hit bailouts in compilation, and make code not compilable any more. We should have a way to disable this check, and just gracefully continue to execute the tests. >> >> To allow a single test method to be `not compilable`: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L160-L161 >> >> To allow all test methods to be `not compilable`: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L140-L144 >> >> See also this documentation in the code: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/compiler/lib/ir_framework/Test.java#L88-L94 >> >> --------------------------------------- >> >> **Background** >> >> My random code seems to hit a bailout in the Register Allocator, and I cannot do much to predict if that bailout happens. >> See https://bugs.openjdk.org/browse/JDK-8304328 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24049#pullrequestreview-2694995446 From epeter at openjdk.org Tue Mar 18 15:20:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Mar 2025 15:20:09 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v3] In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Tue, 18 Mar 2025 15:05:32 GMT, Christian Hagedorn wrote: >> This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. >> >> ### Unified Logic to Clean Up Parse and Template Assertion Predicates >> We now use the following algorithm: >> https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 >> >> This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: >> >> https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 >> >> We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. >> >> #### Ignoring Useless Parse Predicates >> While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. >> >> #### Sharing Code for Parse and Template Assertion Predicates >> With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: >> >> - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. >> - Adding accessor methods to get the Predicate lists from `Compile`. >> - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates >> >> Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. >> >> ### Other Included Changes >> - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could ... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's review Thanks for the updates. I'm rubber stamping this, only did a quick scan ;) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24013#pullrequestreview-2695010001 From chagedorn at openjdk.org Tue Mar 18 15:24:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Mar 2025 15:24:08 GMT Subject: RFR: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor [v3] In-Reply-To: References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Tue, 18 Mar 2025 15:05:32 GMT, Christian Hagedorn wrote: >> This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. >> >> ### Unified Logic to Clean Up Parse and Template Assertion Predicates >> We now use the following algorithm: >> https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 >> >> This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: >> >> https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 >> >> We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. >> >> #### Ignoring Useless Parse Predicates >> While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. >> >> #### Sharing Code for Parse and Template Assertion Predicates >> With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: >> >> - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. >> - Adding accessor methods to get the Predicate lists from `Compile`. >> - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates >> >> Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. >> >> ### Other Included Changes >> - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could ... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's review Thanks Emanuel! I will run some testing again with the recent updates before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24013#issuecomment-2733650892 From eastigeevich at openjdk.org Tue Mar 18 15:40:16 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 18 Mar 2025 15:40:16 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v6] In-Reply-To: References: <8BMaz7Ssnwk6ywuPYfgaL1VSN7XZYBzpvMWRsG12-Eg=.441d03b6-4a0b-479b-b789-d9bfd03bd346@github.com> Message-ID: <0DkMl4UCWkNIUpFJKP2z4T-nP0NjJsrhbuczJLeWVHM=.64561b36-8472-4159-8d90-7cbec2b58d5d@github.com> On Tue, 18 Mar 2025 08:24:51 GMT, Dean Long wrote: >> Thank you for checking it. > > Other nmethod contructors don't have the same locking requirements, because the nmethod hasn't been registered with GC yet. However, for the source nmethod, it could be concurrently patched by GC threads without codeCache_lock and only the per-nmethod CompiledICLocker locking mechanism. So using memcpy() seems problematic here, because a byte-by-byte copy might see on partial updates from NativeCall::set_destination_mt_safe, for example. > Also, there seems to be a critical race with GC here. The destination nmethod isn't going to be registered with GC yet, correct? In that case, GC may patch the source nmethod right after the copy, but before the destination copy is registered, leaving the destination copy with stale data. This seems fatal, as I believe this breaks crucial invariants preventing call sites from referencing stale data. @fisk, am I on the right track here? Hi @dean-long, I see two changes can be made in nmethod: 1. Call sites are patched because of changes in callees. 2. Oops used in nmethod are updated. The first change should not be a problem if we clear all call site. It's already done in the current code by clearing inline caches of the original nmethod. As discussed, we should not clear inline caches of the original but the copy. How the second change (oops) is addressed when we create a new nmethod? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2001347037 From kvn at openjdk.org Tue Mar 18 18:09:35 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Mar 2025 18:09:35 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer Message-ID: <2TCR23-qL2wo6tYphbavzD_rvDhxtX4v49di2MoL5AU=.bfb9dc0a-f0bb-4634-a1bf-56a341d31af8@github.com> Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. ------------- Commit messages: - [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer Changes: https://git.openjdk.org/jdk/pull/24100/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24100&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352112 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24100.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24100/head:pull/24100 PR: https://git.openjdk.org/jdk/pull/24100 From kvn at openjdk.org Tue Mar 18 18:20:18 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Mar 2025 18:20:18 GMT Subject: Withdrawn: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: <2TCR23-qL2wo6tYphbavzD_rvDhxtX4v49di2MoL5AU=.bfb9dc0a-f0bb-4634-a1bf-56a341d31af8@github.com> References: <2TCR23-qL2wo6tYphbavzD_rvDhxtX4v49di2MoL5AU=.bfb9dc0a-f0bb-4634-a1bf-56a341d31af8@github.com> Message-ID: <08CYBmYOtZkWOLIGgFbZDQ5UKTwPQXYecOvNN9eRdxw=.7d800a5d-0737-43e7-bc65-0bc5b8d6dc52@github.com> On Tue, 18 Mar 2025 18:01:58 GMT, Vladimir Kozlov wrote: > Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. > With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. > > There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). > > Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24100 From kvn at openjdk.org Tue Mar 18 18:40:45 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Mar 2025 18:40:45 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer Message-ID: Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. ------------- Commit messages: - 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer Changes: https://git.openjdk.org/jdk/pull/24102/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24102&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352112 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24102.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24102/head:pull/24102 PR: https://git.openjdk.org/jdk/pull/24102 From kvn at openjdk.org Tue Mar 18 20:02:12 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Mar 2025 20:02:12 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v6] In-Reply-To: <0DkMl4UCWkNIUpFJKP2z4T-nP0NjJsrhbuczJLeWVHM=.64561b36-8472-4159-8d90-7cbec2b58d5d@github.com> References: <8BMaz7Ssnwk6ywuPYfgaL1VSN7XZYBzpvMWRsG12-Eg=.441d03b6-4a0b-479b-b789-d9bfd03bd346@github.com> <0DkMl4UCWkNIUpFJKP2z4T-nP0NjJsrhbuczJLeWVHM=.64561b36-8472-4159-8d90-7cbec2b58d5d@github.com> Message-ID: On Tue, 18 Mar 2025 15:37:00 GMT, Evgeny Astigeevich wrote: > How the second change (oops) is addressed when we create a new nmethod Called from `nmethod()` constructor [nmethod::copy_values()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1741) replaces handles used during compilation with oops. Note, `new_nmethod()` holds `CodeCache_lock` and `ciEnv::register_method()` holds `Compile_lock` (and `MethodCompileQueue_lock`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2001891108 From bulasevich at openjdk.org Tue Mar 18 20:36:21 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 18 Mar 2025 20:36:21 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: <2TCR23-qL2wo6tYphbavzD_rvDhxtX4v49di2MoL5AU=.bfb9dc0a-f0bb-4634-a1bf-56a341d31af8@github.com> References: <2TCR23-qL2wo6tYphbavzD_rvDhxtX4v49di2MoL5AU=.bfb9dc0a-f0bb-4634-a1bf-56a341d31af8@github.com> Message-ID: On Tue, 18 Mar 2025 18:01:58 GMT, Vladimir Kozlov wrote: > Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. > With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. > > There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). > > Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. Thanks for fixing that! I managed to reproduce the issue on a fastdebug build on linux-x64 with the --enable-ubsan configuration option. With this change, all UBSAN "applying non-zero offset to null pointer" errors are resolved. I have some doubts I'd like to discuss. [1] This change seems to be a workaround. Setting pointers to nullptr is a standard practice when no meaningful value is available. The RelocationHandler performs pointer arithmetic on the address without checking its validity. Shouldn't the issue be addressed in RelocationHandler instead? [2] If we stick to setting reasonable value to _mutable_data, I have concerns about the chosen value. Why blob_end? We can't even be sure it's within the CodeCache range. Wouldn't it be better to set _mutable_data = header_begin()? Also, it seems not good that _mutable_data is initialized with different values ? once in the member initializer list and then again in the method body. Can we set a default value in the member initializer list instead? Also, I think that we should update the _mutable_data initial value in the second CodeBlob constructor as well. Otherwise CodeBlob::purge can call std::free with nullptr for non-nmethod CodeBlobs (std::free safely handles nullptr, but it's better to avoid relying on it) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24100#issuecomment-2734652066 From bulasevich at openjdk.org Tue Mar 18 20:37:07 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 18 Mar 2025 20:37:07 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 18:35:06 GMT, Vladimir Kozlov wrote: > Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. > With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. > > There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). > > Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. Thanks for fixing that! I managed to reproduce the issue on a fastdebug build on linux-x64 with the --enable-ubsan configuration option. With this change, all UBSAN "applying non-zero offset to null pointer" errors are resolved. I have some doubts I'd like to discuss. [1] This change seems to be a workaround. Setting pointers to nullptr is a standard practice when no meaningful value is available. The RelocationHandler performs pointer arithmetic on the address without checking its validity. Shouldn't the issue be addressed in RelocationHandler instead? [2] If we stick to setting reasonable value to _mutable_data, I have concerns about the chosen value. Why blob_end? We can't even be sure it's within the CodeCache range. Wouldn't it be better to set _mutable_data = header_begin()? Also, it seems not good that _mutable_data is initialized with different values ? once in the member initializer list and then again in the method body. Can we set a default value in the member initializer list instead? Also, I think that we should update the _mutable_data initial value in the second CodeBlob constructor as well. Otherwise CodeBlob::purge can call std::free with nullptr for non-nmethod CodeBlobs (std::free safely handles nullptr, but it's better to avoid relying on it) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24102#issuecomment-2734653924 From kvn at openjdk.org Tue Mar 18 21:52:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Mar 2025 21:52:08 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 20:34:06 GMT, Boris Ulasevich wrote: > This change seems to be a workaround. Setting pointers to nullptr is a standard practice when no meaningful value is available. The RelocationHandler performs pointer arithmetic on the address without checking its validity. Shouldn't the issue be addressed in RelocationHandler instead? Agree. But we can't simply bailout from `RelocIterator()` constructor because we advertise next API: // RelocIterator iter(nm); // while (iter.next()) { Also `RelocIterator::next()` does pointer arithmetic to determine when to stop iterate. So we need to set `_current = _end - 1` if we don't want to modify and complicate it. So we need to set these fields to some valid addresses inside `RelocIterator()` anyway. So why not do that by setting valid address to `_mutable_data` and done? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24102#issuecomment-2734820733 From kvn at openjdk.org Tue Mar 18 22:10:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Mar 2025 22:10:06 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 20:34:06 GMT, Boris Ulasevich wrote: > If we stick to setting reasonable value to _mutable_data, I have concerns about the chosen value. Why blob_end? We can't even be sure it's within the CodeCache range. Wouldn't it be better to set _mutable_data = header_begin()? I really think we should use the same value for `_mutable_data` and `_immutable_data`. And I have concern about using `header_begin()` which address of code blob. `blob_end()` is already tested for `_immutable_data` - it is safe choice. > Also, I think that we should update the _mutable_data initial value in the second CodeBlob constructor as well. I update it in both constructors. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24102#issuecomment-2734847573 PR Comment: https://git.openjdk.org/jdk/pull/24102#issuecomment-2734848300 From kvn at openjdk.org Tue Mar 18 22:15:21 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Mar 2025 22:15:21 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 20:34:06 GMT, Boris Ulasevich wrote: > Can we set a default value in the member initializer list instead? We can't do that without moving `_mutable_data ` field and introducing empty padding. I prefer to keep address fields together. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24102#issuecomment-2734854853 From kvn at openjdk.org Tue Mar 18 22:21:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Mar 2025 22:21:06 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 18:35:06 GMT, Vladimir Kozlov wrote: > Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. > With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. > > There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). > > Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. Actually I can do this (in both constructors): +++ b/src/hotspot/share/code/codeBlob.cpp @@ -121,7 +121,7 @@ CodeBlob::CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size int mutable_data_size) : _oop_maps(nullptr), // will be set by set_oop_maps() call _name(name), - _mutable_data(nullptr), + _mutable_data(header_begin() + size), // default value is blob_end() _size(size), _relocation_size(align_up(cb->total_relocation_size(), oopSize)), _content_offset(CodeBlob::align_code_offset(header_size)), @@ -153,7 +153,7 @@ CodeBlob::CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size } } else { // We need unique and valid not null address - _mutable_data = blob_end(); + assert(_mutable_data == blob_end(), "sanity"); } set_oop_maps(oop_maps); What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24102#issuecomment-2734862556 From sparasa at openjdk.org Tue Mar 18 22:58:09 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 18 Mar 2025 22:58:09 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 22:53:43 GMT, Srinivas Vamsi Parasa wrote: >> LGTM, >> >> Please file a JBS on future modification in assembler layer for EEVEX to REX/REX2 encoding and append to this PR before committing. >> >> Thanks. > >> LGTM, >> >> Please file a JBS on future modification in assembler layer for EEVEX to REX/REX2 encoding and append to this PR before committing. >> >> Thanks. > > Thanks for the review Jatin! The JBS for EEVEX to REX/REX2 demotion has been created: https://bugs.openjdk.org/browse/JDK-8351994 > > Thanks, > Vamsi > > @vamsi-parasa I tried to launch testing, but my script fails because of some merge issue. Would you mind merging from master? > > Hi Emanuel (@eme64), please see the updated code after the merge with master. Hi Emanuel (@eme64), could you please let me know if you're still seeing script failure? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2734910648 From bulasevich at openjdk.org Tue Mar 18 23:40:06 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 18 Mar 2025 23:40:06 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 21:49:53 GMT, Vladimir Kozlov wrote: > So why not do that by setting valid address to _mutable_data and done? Yes. We already have _immutable_data set to blob_end(). By setting _mutable_data to the same value, we align code with _immutable_data and eliminate the UBSAN error, and there's no need to modify RelocIterator. > Actually I can do this (in both constructors): > | - _mutable_data(nullptr), > | + _mutable_data(header_begin() + size), // default value is blob_end() > | > | - _mutable_data = blob_end(); > | + assert(_mutable_data == blob_end(), "sanity"); > > What do you think? Good. It seems correct. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24102#issuecomment-2734958828 From dlong at openjdk.org Wed Mar 19 00:06:08 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Mar 2025 00:06:08 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 18:35:06 GMT, Vladimir Kozlov wrote: > Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. > With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. > > There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). > > Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. src/hotspot/share/code/codeBlob.cpp line 156: > 154: } else { > 155: // We need unique and valid not null address > 156: _mutable_data = blob_end(); It makes me a little nervous pointing this value to real data. When RelocIterator computes `_current = nm->relocation_begin() - 1`, it should never read or write from that address, but how can we guarantee that? Any non-null address that is guarateed unmapped would do, or a special protetected page like `bad_page` here: https://github.com/openjdk/jdk/blob/8e530633a9d99d7ce585cafd5573cb89212feee7/src/hotspot/share/runtime/safepointMechanism.cpp#L66. If using protected memory seems like overkill, then I suggest using a static. Something like this: static union { relocInfo _dummy[1]; } _empty[2]; [...] _mutable_data = _empty+1; However, I think this is not the first time we have run into this issue with RelocIterator. Maybe it's time that we rewrote it to avoid this situation? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24102#discussion_r2002179287 From dlong at openjdk.org Wed Mar 19 00:44:11 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Mar 2025 00:44:11 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v5] In-Reply-To: <2VksBtd_XqgUIQpirjTmAkXUpVZPtahtmLfIoEVRC0A=.895aa101-0baa-461c-970d-b95f146a4f9a@github.com> References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> <2VksBtd_XqgUIQpirjTmAkXUpVZPtahtmLfIoEVRC0A=.895aa101-0baa-461c-970d-b95f146a4f9a@github.com> Message-ID: On Mon, 17 Mar 2025 14:08:37 GMT, Andrew Dinn wrote: >> This PR reworks the existing AArch64 ML_DSA intrinsic code generator to make it clearer to read and easier to maintain. > > Andrew Dinn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - use references and const to avoid VSeq copying and fix int array arg issue > - fix comment > - fix invalid register argument > - fix errors in comments > - fix whitespace errors > - Clearer implementation of AArch64 dilithium generator I'm not an expert, but this looks good overall, and I'm relying on Andrew's testing to verify the details. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24026#pullrequestreview-2696581482 From rraj at openjdk.org Wed Mar 19 02:10:18 2025 From: rraj at openjdk.org (Rohit Arul Raj) Date: Wed, 19 Mar 2025 02:10:18 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 15:48:04 GMT, Paul Sandoz wrote: >> Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: >> >> create a separate method to check for cpu's supporting avx512 version of simd sort > > Looks good, thank you for updating. I am not a proper HotSpot reviewer so i bumped up the number of required reviewers, and a HotSpot developer needs to quickly review it. @PaulSandoz @vamsi-parasa : Can I integrate this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24053#issuecomment-2735141329 From fyang at openjdk.org Wed Mar 19 02:44:08 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 19 Mar 2025 02:44:08 GMT Subject: RFR: 8320997: RISC-V: C2 ReverseV In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 10:42:21 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this patch to implement ReverseV? > > Thanks! That looks great to me. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24096#pullrequestreview-2696755905 From fyang at openjdk.org Wed Mar 19 02:46:08 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 19 Mar 2025 02:46:08 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 [v2] In-Reply-To: References: Message-ID: <5KTCR5Au1rdzq225ASnfXd6Rdyp380-zXakiPJGJlfE=.6d4129e2-0d2c-4034-ae5e-aaca7f93d226@github.com> On Tue, 18 Mar 2025 12:37:02 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> These client tests seems not that useful to me, so the simple solution could be just disable them on riscv. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comments Thanks for finding this out! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24027#pullrequestreview-2696757907 From psandoz at openjdk.org Wed Mar 19 03:47:08 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 19 Mar 2025 03:47:08 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: <79FSY9KcXhrrPH-ZrAEZINRSwWMcwC19mu3YzCrYL3M=.55f1ccea-1e7b-4990-8f0a-f3adc3ba86cd@github.com> On Mon, 17 Mar 2025 15:28:12 GMT, Rohit Arul Raj wrote: >> In JDK-8309130, Array sort was optimized using AVX512 SIMD instructions for x86_64. Currently, this optimization has been disabled for AMD Zen 4 [JDK-8317763] due to bad performance of compressstoreu. >> Ref: https://www.reddit.com/r/java/comments/171t5sj/heads_up_openjdk_implementation_of_avx512_based/. >> >> This patch enables Zen 4 to pick optimized AVX2 version of SIMD sort and Zen 5 picks the AVX512 version. >> >> JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. >> >> Attaching ArraySort performance data for Zen4 & Zen5. >> [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) >> [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) > > Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: > > create a separate method to check for cpu's supporting avx512 version of simd sort No, before you can do that need another review from a HotSpot reviewer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24053#issuecomment-2735244838 From swen at openjdk.org Wed Mar 19 06:18:39 2025 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 19 Mar 2025 06:18:39 GMT Subject: RFR: 8352316: More MergeStoreBench In-Reply-To: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: On Wed, 19 Mar 2025 03:28:59 GMT, Shaojin Wen wrote: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works The performance numbers show that putNull_unsafePutInt and putNull_utf16_unsafePutLong perform more than 10 times better. It can be seen that MergeStore is very suitable for these scenarios. # Scipt git remote add wenshao git at github.com:wenshao/jdk.git git fetch wenshao git clone 23dba8c52454ae90eab4cb1b0a168c6e7249dd38 make test TEST="micro:vm.compiler.MergeStoreBench.putNull" ## 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC? Genoa) Benchmark Mode Cnt Score Error Units MergeStoreBench.putNull_arraycopy avgt 5 6715.041 ? 18.765 ns/op MergeStoreBench.putNull_getBytes avgt 5 5880.725 ? 12.261 ns/op MergeStoreBench.putNull_getChars avgt 5 11972.642 ? 24.990 ns/op MergeStoreBench.putNull_string_builder avgt 5 15643.372 ? 4526.932 ns/op MergeStoreBench.putNull_unsafePutInt avgt 5 280.570 ? 0.669 ns/op MergeStoreBench.putNull_utf16_arrayCopy avgt 5 13053.191 ? 24.954 ns/op MergeStoreBench.putNull_utf16_string_builder avgt 5 16349.747 ? 5029.799 ns/op MergeStoreBench.putNull_utf16_unsafePutLong avgt 5 579.580 ? 0.710 ns/op ## 3. aliyun_ecs_c8i_x64 (CPU Intel?Xeon?Emerald Rapids) Benchmark Mode Cnt Score Error Units MergeStoreBench.putNull_arraycopy avgt 5 8029.622 ? 60.856 ns/op MergeStoreBench.putNull_getBytes avgt 5 7444.635 ? 39.552 ns/op MergeStoreBench.putNull_getChars avgt 5 16657.442 ? 147.301 ns/op MergeStoreBench.putNull_string_builder avgt 5 23008.159 ? 6143.167 ns/op MergeStoreBench.putNull_unsafePutInt avgt 5 235.302 ? 2.004 ns/op MergeStoreBench.putNull_utf16_arrayCopy avgt 5 18330.317 ? 142.242 ns/op MergeStoreBench.putNull_utf16_string_builder avgt 5 25843.593 ? 7089.392 ns/op MergeStoreBench.putNull_utf16_unsafePutLong avgt 5 1860.076 ? 16.703 ns/op ## 4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710) Benchmark Mode Cnt Score Error Units MergeStoreBench.putNull_arraycopy avgt 5 8114.176 ? 36.685 ns/op MergeStoreBench.putNull_getBytes avgt 5 6171.538 ? 5.845 ns/op MergeStoreBench.putNull_getChars avgt 5 10432.681 ? 26.401 ns/op MergeStoreBench.putNull_string_builder avgt 5 21238.753 ? 1428.244 ns/op MergeStoreBench.putNull_unsafePutInt avgt 5 349.233 ? 1.521 ns/op MergeStoreBench.putNull_utf16_arrayCopy avgt 5 16063.018 ? 22.127 ns/op MergeStoreBench.putNull_utf16_string_builder avgt 5 22327.827 ? 414.499 ns/op MergeStoreBench.putNull_utf16_unsafePutLong avgt 5 863.733 ? 0.693 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2735425909 From swen at openjdk.org Wed Mar 19 06:18:39 2025 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 19 Mar 2025 06:18:39 GMT Subject: RFR: 8352316: More MergeStoreBench Message-ID: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works ------------- Commit messages: - more MergeStoreBench Changes: https://git.openjdk.org/jdk/pull/24108/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352316 Stats: 116 lines in 1 file changed: 116 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24108/head:pull/24108 PR: https://git.openjdk.org/jdk/pull/24108 From chagedorn at openjdk.org Wed Mar 19 07:37:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 07:37:07 GMT Subject: RFR: 8352248: Check if CMoveX is supported In-Reply-To: References: Message-ID: <682nfn8AxjCdkuFWOrS38UqFMPG1c_0yRG8CCfBwWIE=.f965f3cb-5a50-465b-adc2-ce57b6cb3817@github.com> On Tue, 18 Mar 2025 10:02:27 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Currenlty, seems CMoveX are fully supported on most platforms, except of riscv64. > On riscv64, there is no efficient way to implement CMoveF/D as other CMoveX (e.g. CMoveI), but it will still bring benefit by just supporting CMoveX without CMoveF/D. This patch is to supply such option. > > As other platforms already supported CMoveX, this patch should not impact them, as `!CMoveNode::supported(_igvn.type(phi))` should always be false. > > BTW, in a subsequent pr for riscv, I'll implement CMoveX except of CMoveF/D, and also return false for CMoveF/D in Matcher::match_rule_supported. > > Thanks! Looks reasonable. src/hotspot/share/opto/movenode.cpp line 204: > 202: > 203: bool CMoveNode::supported(const Type* t) { > 204: switch( t->basic_type() ) { Suggestion: switch (t->basic_type()) { src/hotspot/share/opto/movenode.cpp line 214: > 212: default: > 213: ShouldNotReachHere(); > 214: return false; Indentation: Suggestion: ShouldNotReachHere(); return false; ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24095#pullrequestreview-2697245751 PR Review Comment: https://git.openjdk.org/jdk/pull/24095#discussion_r2002629715 PR Review Comment: https://git.openjdk.org/jdk/pull/24095#discussion_r2002630356 From chagedorn at openjdk.org Wed Mar 19 07:44:16 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 07:44:16 GMT Subject: Integrated: 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor In-Reply-To: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> References: <79STyS_P6MUAQGGxkEubh1zyBH9m5lGbb0O9vhR7TdU=.23a7da6c-1454-4846-9d5f-be4652ebfbec@github.com> Message-ID: On Wed, 12 Mar 2025 16:18:53 GMT, Christian Hagedorn wrote: > This patch cleans the Parse and Template Assertion Predicate elimination code up. We now use a single `PredicateVisitor` and share code in a new `EliminateUselessPredicates` class which contains the code previously found in `PhaseIdealLoop::eliminate_useless_predicates()`. > > ### Unified Logic to Clean Up Parse and Template Assertion Predicates > We now use the following algorithm: > https://github.com/openjdk/jdk/blob/5e4b6ca0ddafa80eee60690caacd257b74305d4e/src/hotspot/share/opto/predicates.cpp#L1174-L1179 > > This is different from the old algorithm where we used a single boolean state `_useless`. But that does no longer work because when we first mark Template Assertion Predicates useless, we are no longer visiting them when iterating through predicates: > > https://github.com/openjdk/jdk/blob/a21fa463c4f8d067c18c09a072f3cdfa772aea5e/src/hotspot/share/opto/predicates.hpp#L704-L708 > > We therefore require a third state. Thus, I introduced a new tri-state `PredicateState` that provides a special `MaybeUseful` value which we can set each Predicate to. > > #### Ignoring Useless Parse Predicates > While working on this patch, I've noticed that we are always visiting Parse Predicates - even when they useless. We should change that to align it with what we have for the other Predicates (changed in JDK-8351280). To make this work, we also replace the `_useless` state in `ParsePredicateNode` with a new `PredicateState`. > > #### Sharing Code for Parse and Template Assertion Predicates > With all the mentioned changes in place, I could nicely share code for the elimination of Parse and Template Assertion Predicates in `EliminateUselessPredicates` by using templates. The following additional changes were required: > > - Changing the template parameter of `_template_assertion_predicate_opaques` to the more specific `OpaqueTemplateAssertionPredicateNode` type. > - Adding accessor methods to get the Predicate lists from `Compile`. > - Updating `ParsePredicate::mark_useless()` to pass in `PhaseIterGVN`, as done for Assertion Predicates > > Note that we still do not directly replace the useless Predicates but rather mark them useless as initiated by JDK-8351280. > > ### Other Included Changes > - During the various refactoring steps, I somehow dropped the code to add newly cloned Template Assertion Predicate to the `_template_assertion_predicate_opaques` list. It was done directly in the old cloning methods. This is not relevant for correctness but could hinder some optimizations. I've added the code now i... This pull request has now been integrated. Changeset: e57b2725 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/e57b2725065eaa79db7f9132f4152cbae9279f8e Stats: 460 lines in 12 files changed: 263 ins; 143 del; 54 mod 8350578: Refactor useless Parse and Template Assertion Predicate elimination code by using a PredicateVisitor Reviewed-by: epeter, kvn, roland ------------- PR: https://git.openjdk.org/jdk/pull/24013 From duke at openjdk.org Wed Mar 19 08:12:27 2025 From: duke at openjdk.org (Marc Chevalier) Date: Wed, 19 Mar 2025 08:12:27 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v2] In-Reply-To: References: Message-ID: > Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! > > Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might > be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. > > To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. > > The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: > > "precise .*" + IS_REPLACED + ":" > > which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: > > "precise .*\\b" + IS_REPLACED + ":" > > which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. > > It is not quite fool-proof since a package path can still be extended, e.g. > > @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) > > will also match allocations of `a/prefix/some/package/MyClass`. > > I think it's an acceptable limitation. > > Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: > > 140 Allocate === 120 117 137 8 1 (93 138 23 1 1 10 43 43 10 43 ) [[ 141 142 143 150 151 152 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:13 (line 41) Test::test @ bci:5 (line 46) !jvms: Test$MyClass:: @ bci:13 (line 41) Test::test @ bci:5 (line 46) > > and now it is > > 140 Allocate === 120 117 137 8 1 (93 138 23 1 1 10 43 43 10 43 ) [[ 141 142 143 150 151 152 ]] precise java/util/HashSet: 0x00007fe2244ccd28 (java/lang/Cloneable,java/io/Serializable,java/lang/Iterable,java/util/Collection,java/util/Set):Constant:exact * rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:13 (line 41) Test::test @ bci:5 (line 46) !jvms: Test$MyClass:: @ bci... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: more compact printing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24093/files - new: https://git.openjdk.org/jdk/pull/24093/files/07928fa2..e1245125 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24093&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24093&range=00-01 Stats: 96 lines in 5 files changed: 84 ins; 6 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24093/head:pull/24093 PR: https://git.openjdk.org/jdk/pull/24093 From duke at openjdk.org Wed Mar 19 08:39:52 2025 From: duke at openjdk.org (Marc Chevalier) Date: Wed, 19 Mar 2025 08:39:52 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v3] In-Reply-To: References: Message-ID: > Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! > > Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might > be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. > > To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. > > The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: > > "precise .*" + IS_REPLACED + ":" > > which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: > > "precise .*\\b" + IS_REPLACED + ":" > > which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. > > It is not quite fool-proof since a package path can still be extended, e.g. > > @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) > > will also match allocations of `a/prefix/some/package/MyClass`. > > I think it's an acceptable limitation. > > Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: > > 140 Allocate === 120 117 137 8 1 (93 138 23 1 1 10 43 43 10 43 ) [[ 141 142 143 150 151 152 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:13 (line 41) Test::test @ bci:5 (line 46) !jvms: Test$MyClass:: @ bci:13 (line 41) Test::test @ bci:5 (line 46) > > and now it is > > 140 Allocate === 120 117 137 8 1 (93 138 23 1 1 10 43 43 10 43 ) [[ 141 142 143 150 151 152 ]] precise java/util/HashSet: 0x00007fe2244ccd28 (java/lang/Cloneable,java/io/Serializable,java/lang/Iterable,java/util/Collection,java/util/Set):Constant:exact * rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:13 (line 41) Test::test @ bci:5 (line 46) !jvms: Test$MyClass:: @ bci... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Avoid right-expansion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24093/files - new: https://git.openjdk.org/jdk/pull/24093/files/e1245125..cc477143 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24093&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24093&range=01-02 Stats: 18 lines in 2 files changed: 10 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24093/head:pull/24093 PR: https://git.openjdk.org/jdk/pull/24093 From chagedorn at openjdk.org Wed Mar 19 09:04:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 09:04:09 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 08:45:34 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review test/hotspot/jtreg/compiler/arraycopy/TestSunkLoadAntiDependency.java line 29: > 27: * @summary C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure > 28: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseOnStackReplacement > 29: * -XX:CompileOnly=TestSunkLoadAntiDependency::test1 TestSunkLoadAntiDependency Drive by comments: Is `-XX:-UseOnStackReplacement` required to reproduce the issue? There was also a crash when running with `-XX:+TraceLoopOpts`. Can you also add a run with that flag to verify that this patch also fixes that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2002805925 From duke at openjdk.org Wed Mar 19 09:05:45 2025 From: duke at openjdk.org (Marc Chevalier) Date: Wed, 19 Mar 2025 09:05:45 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v4] In-Reply-To: References: Message-ID: > Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! > > Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might > be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. > > To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. > > The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: > > "precise .*" + IS_REPLACED + ":" > > which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: > > "allocationKlass:.*\\b" + IS_REPLACED + "\\s" > > which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. It can also not be extended on the right into `MyClassButNotReally` because of the space character that must exist after the searched string. > > The case of array allocations is slightly more tricky, but essentially similar. > > It is not quite fool-proof since a package path can still be extended, e.g. > > @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) > > will also match allocations of `a/prefix/some/package/MyClass`. > > I think it's an acceptable limitation. > > Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: > > 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) !jvms: Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) > > and now it is > > 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) allocationKlass:java/util/Ha... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: revert also formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24093/files - new: https://git.openjdk.org/jdk/pull/24093/files/cc477143..9ecb2d6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24093&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24093&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24093/head:pull/24093 PR: https://git.openjdk.org/jdk/pull/24093 From roland at openjdk.org Wed Mar 19 09:39:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Mar 2025 09:39:08 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: <2OJ3-DbdBBiHjZKVDGNXb-IFLqXi0jI4Ezz_zs6NsMA=.a904b56a-a3c4-4f06-a708-70223beccb60@github.com> Message-ID: On Tue, 18 Mar 2025 14:48:22 GMT, Christian Hagedorn wrote: > Also for the type being zero on the div by zero failing path which lets some type nodes die and control is not because we don't have an "everything but zero" type. Is there a bug/test case for that one? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2735920086 From mli at openjdk.org Wed Mar 19 09:43:24 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Mar 2025 09:43:24 GMT Subject: RFR: 8352248: Check if CMoveX is supported [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > > Currenlty, seems CMoveX are fully supported on most platforms, except of riscv64. > On riscv64, there is no efficient way to implement CMoveF/D as other CMoveX (e.g. CMoveI), but it will still bring benefit by just supporting CMoveX without CMoveF/D. This patch is to supply such option. > > As other platforms already supported CMoveX, this patch should not impact them, as `!CMoveNode::supported(_igvn.type(phi))` should always be false. > > BTW, in a subsequent pr for riscv, I'll implement CMoveX except of CMoveF/D, and also return false for CMoveF/D in Matcher::match_rule_supported. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24095/files - new: https://git.openjdk.org/jdk/pull/24095/files/6cfbe221..5a92aa07 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24095&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24095&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24095/head:pull/24095 PR: https://git.openjdk.org/jdk/pull/24095 From mli at openjdk.org Wed Mar 19 09:43:25 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Mar 2025 09:43:25 GMT Subject: RFR: 8352248: Check if CMoveX is supported [v2] In-Reply-To: <682nfn8AxjCdkuFWOrS38UqFMPG1c_0yRG8CCfBwWIE=.f965f3cb-5a50-465b-adc2-ce57b6cb3817@github.com> References: <682nfn8AxjCdkuFWOrS38UqFMPG1c_0yRG8CCfBwWIE=.f965f3cb-5a50-465b-adc2-ce57b6cb3817@github.com> Message-ID: On Wed, 19 Mar 2025 07:34:07 GMT, Christian Hagedorn wrote: > Looks reasonable. Thank you! > src/hotspot/share/opto/movenode.cpp line 204: > >> 202: >> 203: bool CMoveNode::supported(const Type* t) { >> 204: switch( t->basic_type() ) { > > Suggestion: > > switch (t->basic_type()) { Fixed. > src/hotspot/share/opto/movenode.cpp line 214: > >> 212: default: >> 213: ShouldNotReachHere(); >> 214: return false; > > Indentation: > Suggestion: > > ShouldNotReachHere(); > return false; Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24095#issuecomment-2735937111 PR Review Comment: https://git.openjdk.org/jdk/pull/24095#discussion_r2002895314 PR Review Comment: https://git.openjdk.org/jdk/pull/24095#discussion_r2002895502 From mli at openjdk.org Wed Mar 19 09:45:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Mar 2025 09:45:08 GMT Subject: RFR: 8320997: RISC-V: C2 ReverseV In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 02:41:49 GMT, Fei Yang wrote: > That looks great to me. Thanks! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24096#issuecomment-2735942934 From mli at openjdk.org Wed Mar 19 09:45:15 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Mar 2025 09:45:15 GMT Subject: RFR: 8351902: RISC-V: Several tests fail after JDK-8351145 [v2] In-Reply-To: <5KTCR5Au1rdzq225ASnfXd6Rdyp380-zXakiPJGJlfE=.6d4129e2-0d2c-4034-ae5e-aaca7f93d226@github.com> References: <5KTCR5Au1rdzq225ASnfXd6Rdyp380-zXakiPJGJlfE=.6d4129e2-0d2c-4034-ae5e-aaca7f93d226@github.com> Message-ID: On Wed, 19 Mar 2025 02:43:57 GMT, Fei Yang wrote: > Thanks for finding this out! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24027#issuecomment-2735941147 From mli at openjdk.org Wed Mar 19 09:45:16 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Mar 2025 09:45:16 GMT Subject: Integrated: 8351902: RISC-V: Several tests fail after JDK-8351145 In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:57:48 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > These client tests seems not that useful to me, so the simple solution could be just disable them on riscv. > > Thanks! This pull request has now been integrated. Changeset: c2be19c2 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/c2be19c261ba45df29865077b511c49bb61433a6 Stats: 6 lines in 3 files changed: 6 ins; 0 del; 0 mod 8351902: RISC-V: Several tests fail after JDK-8351145 Reviewed-by: rehn, fyang, syan ------------- PR: https://git.openjdk.org/jdk/pull/24027 From amitkumar at openjdk.org Wed Mar 19 09:50:15 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 19 Mar 2025 09:50:15 GMT Subject: RFR: 8341908: CodeHeapAnalytics: Output Imperfections and unwanted vm termination [v3] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 21:06:48 GMT, Lutz Schmidt wrote: >> Output is properly aligned again now. Was messed up when method hotness was removed (part of method sweeper). >> Assertions have been replaced by printing an error message and gracefully returning. Avoids vm crashes caused by diagnostic actions. >> Some code restructuring, removal of redundancies. >> >> Reviews are highly welcomed. > > Lutz Schmidt has updated the pull request incrementally with one additional commit since the last revision: > > 8341908: fix make error src/hotspot/share/code/codeHeapState.cpp line 2391: > 2389: > 2390: void CodeHeapState::print_line_delim(outputStream* out, bufferedStream* ast, char* low_bound, unsigned int ix, unsigned int gpl) { > 2391: // Note: out and ast MUST NOT designate the SAME stream! Was this intentional? Wouldn't an assert be better than a Note :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21452#discussion_r2002910506 From duke at openjdk.org Wed Mar 19 10:02:16 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 19 Mar 2025 10:02:16 GMT Subject: Integrated: 8350485: C2: factor out common code in Node::grow() and Node::out_grow() In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 09:08:41 GMT, Saranya Natarajan wrote: > Node:grow() and Node::out_grow() are copy-pasted from each other and their core logic could be factored out into a third function or at least cleaned up. Hence,the fix includes a function Node::array_resize() that implements the core logic of Node::grow() and Node::out_grow(). > > Link to Github action which had no failures : https://github.com/sarannat/jdk/actions/runs/13677508359 This pull request has now been integrated. Changeset: 8f64ccc0 Author: Saranya Natarajan Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/8f64ccc01b8c692b59e81255c59c333cc23e834d Stats: 49 lines in 2 files changed: 10 ins; 17 del; 22 mod 8350485: C2: factor out common code in Node::grow() and Node::out_grow() Reviewed-by: thartmann, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/23928 From jwaters at openjdk.org Wed Mar 19 10:39:09 2025 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 19 Mar 2025 10:39:09 GMT Subject: RFR: 8350609: cleanup unknown unwind opcode (0xB) for windows In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 03:58:17 GMT, Dhamoder Nalla wrote: > This PR is to clean-up unknown unwind opcodes (0xB) in Windows intrinsic functions introduced in commit https://github.com/openjdk/jdk17u-dev/commit/9f05c411e6d6bdf612cf0cf8b9fe4ca9ecde50d1#diff-a024df6bcd94607260545e647922261703a652dee1afadb1fa758f6e74a568d1 > > ![image](https://github.com/user-attachments/assets/5b295365-ba8e-4fd6-8b8b-f7243f80a496) > > According to the Windows unwind Opcodes outlined at https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170#unwind-operation-code, the opcode 0xB (1011) is not a valid Opcode, as the valid opcodes range from 0 to 10. This looks ok to me ------------- PR Review: https://git.openjdk.org/jdk/pull/23707#pullrequestreview-2697875516 From adinn at openjdk.org Wed Mar 19 10:56:11 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 19 Mar 2025 10:56:11 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v5] In-Reply-To: References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> <2VksBtd_XqgUIQpirjTmAkXUpVZPtahtmLfIoEVRC0A=.895aa101-0baa-461c-970d-b95f146a4f9a@github.com> Message-ID: On Wed, 19 Mar 2025 00:41:55 GMT, Dean Long wrote: >> Andrew Dinn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - use references and const to avoid VSeq copying and fix int array arg issue >> - fix comment >> - fix invalid register argument >> - fix errors in comments >> - fix whitespace errors >> - Clearer implementation of AArch64 dilithium generator > > I'm not an expert, but this looks good overall, and I'm relying on Andrew's testing to verify the details. @dean-long Thanks for the review! @ferakocz Do you have any comments to add regarding this restructuring? In particular, is the information in the comments clear and correct? I'd prefer confirmation from you that the patch is ok before committing this change. I would also like to investigate whether your [ML_KEM PR](https://git.openjdk.org/jdk/pull/23663) might benefit from adopting a similar restructuring (i.e. using operations scheduled over vector sequences). It looks to me like that would be the case. Could you comment on that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24026#issuecomment-2736189193 From chagedorn at openjdk.org Wed Mar 19 11:54:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 11:54:07 GMT Subject: RFR: 8352248: Check if CMoveX is supported [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 09:43:24 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Currenlty, seems CMoveX are fully supported on most platforms, except of riscv64. >> On riscv64, there is no efficient way to implement CMoveF/D as other CMoveX (e.g. CMoveI), but it will still bring benefit by just supporting CMoveX without CMoveF/D. This patch is to supply such option. >> >> As other platforms already supported CMoveX, this patch should not impact them, as `!CMoveNode::supported(_igvn.type(phi))` should always be false. >> >> BTW, in a subsequent pr for riscv, I'll implement CMoveX except of CMoveF/D, and also return false for CMoveF/D in Matcher::match_rule_supported. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Update looks good, thanks. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24095#pullrequestreview-2698114478 From chagedorn at openjdk.org Wed Mar 19 11:59:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 11:59:09 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: <2OJ3-DbdBBiHjZKVDGNXb-IFLqXi0jI4Ezz_zs6NsMA=.a904b56a-a3c4-4f06-a708-70223beccb60@github.com> Message-ID: On Wed, 19 Mar 2025 09:36:16 GMT, Roland Westrelin wrote: > > Also for the type being zero on the div by zero failing path which lets some type nodes die and control is not because we don't have an "everything but zero" type. > > Is there a bug/test case for that one? I think it was that one: https://github.com/openjdk/jdk/pull/16844 and related/linked issues. We just removed the `CastIINode::Value()` type improvement as a point fix - maybe that can also be reverted with your patch as a follow-up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2736368410 From chagedorn at openjdk.org Wed Mar 19 12:48:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 12:48:11 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v4] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 09:05:45 GMT, Marc Chevalier wrote: >> Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! >> >> Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might >> be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. >> >> To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. >> >> The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: >> >> "precise .*" + IS_REPLACED + ":" >> >> which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: >> >> "allocationKlass:.*\\b" + IS_REPLACED + "\\s" >> >> which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. It can also not be extended on the right into `MyClassButNotReally` because of the space character that must exist after the searched string. >> >> The case of array allocations is slightly more tricky, but essentially similar. >> >> It is not quite fool-proof since a package path can still be extended, e.g. >> >> @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) >> >> will also match allocations of `a/prefix/some/package/MyClass`. >> >> I think it's an acceptable limitation. >> >> Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) !jvms: Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) >> >> and now it is >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/l... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > revert also formatting Great work! It's good to see that allocations can finally be matched on the ideal graph instead of the fragile and platform dependent `PrintOptoAssembly` output. Also nice that you caught some problems with inexactly matching class names and improving these with more powerful regexes! I only have a small comment, otherwise, it looks good to me. src/hotspot/share/opto/callnode.hpp line 1068: > 1066: #ifndef PRODUCT > 1067: virtual void dump_spec(outputStream* st) const; > 1068: #endif For single line not product `defs`, you can use: Suggestion: NOT_PRODUCT(virtual void dump_spec(outputStream* st) const;) test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java line 116: > 114: ); > 115: > 116: runCheck(BadFailOnConstraint.create(AllocInstance.class, "allocInstance()", 1), Nice additional tests :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24093#pullrequestreview-2698257623 PR Review Comment: https://git.openjdk.org/jdk/pull/24093#discussion_r2003227244 PR Review Comment: https://git.openjdk.org/jdk/pull/24093#discussion_r2003246000 From duke at openjdk.org Wed Mar 19 12:58:16 2025 From: duke at openjdk.org (Marc Chevalier) Date: Wed, 19 Mar 2025 12:58:16 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v5] In-Reply-To: References: Message-ID: > Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! > > Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might > be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. > > To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. > > The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: > > "precise .*" + IS_REPLACED + ":" > > which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: > > "allocationKlass:.*\\b" + IS_REPLACED + "\\s" > > which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. It can also not be extended on the right into `MyClassButNotReally` because of the space character that must exist after the searched string. > > The case of array allocations is slightly more tricky, but essentially similar. > > It is not quite fool-proof since a package path can still be extended, e.g. > > @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) > > will also match allocations of `a/prefix/some/package/MyClass`. > > I think it's an acceptable limitation. > > Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: > > 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) !jvms: Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) > > and now it is > > 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) allocationKlass:java/util/Ha... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: replace #ifndef PRODUCT with NOT_PRODUCT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24093/files - new: https://git.openjdk.org/jdk/pull/24093/files/9ecb2d6c..cb8a71cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24093&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24093&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24093/head:pull/24093 PR: https://git.openjdk.org/jdk/pull/24093 From duke at openjdk.org Wed Mar 19 12:58:17 2025 From: duke at openjdk.org (Marc Chevalier) Date: Wed, 19 Mar 2025 12:58:17 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v4] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 12:37:19 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> revert also formatting > > src/hotspot/share/opto/callnode.hpp line 1068: > >> 1066: #ifndef PRODUCT >> 1067: virtual void dump_spec(outputStream* st) const; >> 1068: #endif > > For single line not product `defs`, you can use: > Suggestion: > > NOT_PRODUCT(virtual void dump_spec(outputStream* st) const;) Done. It is shorter, and I don't see a good reason not to use it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24093#discussion_r2003264188 From chagedorn at openjdk.org Wed Mar 19 13:04:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 13:04:23 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v5] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 12:58:16 GMT, Marc Chevalier wrote: >> Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! >> >> Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might >> be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. >> >> To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. >> >> The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: >> >> "precise .*" + IS_REPLACED + ":" >> >> which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: >> >> "allocationKlass:.*\\b" + IS_REPLACED + "\\s" >> >> which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. It can also not be extended on the right into `MyClassButNotReally` because of the space character that must exist after the searched string. >> >> The case of array allocations is slightly more tricky, but essentially similar. >> >> It is not quite fool-proof since a package path can still be extended, e.g. >> >> @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) >> >> will also match allocations of `a/prefix/some/package/MyClass`. >> >> I think it's an acceptable limitation. >> >> Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) !jvms: Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) >> >> and now it is >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/l... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > replace #ifndef PRODUCT with NOT_PRODUCT test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java line 75: > 73: runWithArguments(MultipleFailOnGood.class, "-XX:TLABRefillWasteFraction=50"); > 74: > 75: runCheck(new String[] {"-XX:TLABRefillWasteFraction=50", "-XX:+UsePerfData", "-XX:+UseTLAB"}, BadFailOnConstraint.create(AndOr1.class, "test1(int)", 1, "CallStaticJava")); You can also update the copyright year of that file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24093#discussion_r2003276042 From epeter at openjdk.org Wed Mar 19 13:10:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Mar 2025 13:10:32 GMT Subject: RFR: 8350609: cleanup unknown unwind opcode (0xB) for windows In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 03:58:17 GMT, Dhamoder Nalla wrote: > This PR is to clean-up unknown unwind opcodes (0xB) in Windows intrinsic functions introduced in commit https://github.com/openjdk/jdk17u-dev/commit/9f05c411e6d6bdf612cf0cf8b9fe4ca9ecde50d1#diff-a024df6bcd94607260545e647922261703a652dee1afadb1fa758f6e74a568d1 > > ![image](https://github.com/user-attachments/assets/5b295365-ba8e-4fd6-8b8b-f7243f80a496) > > According to the Windows unwind Opcodes outlined at https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170#unwind-operation-code, the opcode 0xB (1011) is not a valid Opcode, as the valid opcodes range from 0 to 10. @dhanalla As @vivdesh asked above: do you have a regression test for this? You also have this warning above: Warning ?? Found leading lowercase letter in issue title for 8350609: cleanup unknown unwind opcode (0xB) for windows ------------- PR Comment: https://git.openjdk.org/jdk/pull/23707#issuecomment-2736576656 From duke at openjdk.org Wed Mar 19 13:26:51 2025 From: duke at openjdk.org (Marc Chevalier) Date: Wed, 19 Mar 2025 13:26:51 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v6] In-Reply-To: References: Message-ID: > Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! > > Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might > be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. > > To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. > > The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: > > "precise .*" + IS_REPLACED + ":" > > which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: > > "allocationKlass:.*\\b" + IS_REPLACED + "\\s" > > which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. It can also not be extended on the right into `MyClassButNotReally` because of the space character that must exist after the searched string. > > The case of array allocations is slightly more tricky, but essentially similar. > > It is not quite fool-proof since a package path can still be extended, e.g. > > @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) > > will also match allocations of `a/prefix/some/package/MyClass`. > > I think it's an acceptable limitation. > > Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: > > 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) !jvms: Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) > > and now it is > > 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) allocationKlass:java/util/Ha... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Update copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24093/files - new: https://git.openjdk.org/jdk/pull/24093/files/cb8a71cb..d1922cf4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24093&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24093&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24093/head:pull/24093 PR: https://git.openjdk.org/jdk/pull/24093 From duke at openjdk.org Wed Mar 19 13:26:52 2025 From: duke at openjdk.org (Marc Chevalier) Date: Wed, 19 Mar 2025 13:26:52 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v5] In-Reply-To: References: Message-ID: <_fqiTjtw4hfPrBjSKEAycD555SGfjOh9Z74qmPE3QFY=.ed45a59c-bc05-4629-8e8c-cabb346f6bc8@github.com> On Wed, 19 Mar 2025 13:01:40 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> replace #ifndef PRODUCT with NOT_PRODUCT > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java line 75: > >> 73: runWithArguments(MultipleFailOnGood.class, "-XX:TLABRefillWasteFraction=50"); >> 74: >> 75: runCheck(new String[] {"-XX:TLABRefillWasteFraction=50", "-XX:+UsePerfData", "-XX:+UseTLAB"}, BadFailOnConstraint.create(AndOr1.class, "test1(int)", 1, "CallStaticJava")); > > You can also update the copyright year of that file. ? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24093#discussion_r2003319101 From chagedorn at openjdk.org Wed Mar 19 14:07:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 14:07:08 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v6] In-Reply-To: References: Message-ID: <3O7PeeD1nlXQV6nYmamVHXzXFkuKI-n181myl5PIRxk=.7b0471fc-b600-434f-8689-9ee308edbe2f@github.com> On Wed, 19 Mar 2025 13:26:51 GMT, Marc Chevalier wrote: >> Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! >> >> Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might >> be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. >> >> To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. >> >> The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: >> >> "precise .*" + IS_REPLACED + ":" >> >> which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: >> >> "allocationKlass:.*\\b" + IS_REPLACED + "\\s" >> >> which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. It can also not be extended on the right into `MyClassButNotReally` because of the space character that must exist after the searched string. >> >> The case of array allocations is slightly more tricky, but essentially similar. >> >> It is not quite fool-proof since a package path can still be extended, e.g. >> >> @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) >> >> will also match allocations of `a/prefix/some/package/MyClass`. >> >> I think it's an acceptable limitation. >> >> Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) !jvms: Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) >> >> and now it is >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/l... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright year Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24093#pullrequestreview-2698594241 From chagedorn at openjdk.org Wed Mar 19 14:36:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 14:36:29 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: > The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). > > Note that most of the line changes are from new tests. > > ### The Problem > There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. > > #### An Inner Loop without Parse Predicates > The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: > > ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) > > We only have Parse Predicates for the outer loop. Why? > > Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: > > ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) > > In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. > > For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. > > #### Predicates from a Folded Loop End up at Another Loop > As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is only running for a single iteration and the bac... Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Small things - Fix test comments - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. - Merge branch 'master' into JDK-8350579 - Revert fix completely - 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away during IGVN ------------- Changes: https://git.openjdk.org/jdk/pull/23823/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23823&range=01 Stats: 709 lines in 9 files changed: 582 ins; 44 del; 83 mod Patch: https://git.openjdk.org/jdk/pull/23823.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23823/head:pull/23823 PR: https://git.openjdk.org/jdk/pull/23823 From chagedorn at openjdk.org Wed Mar 19 14:36:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 14:36:29 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away In-Reply-To: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Thu, 27 Feb 2025 13:07:46 GMT, Christian Hagedorn wrote: > The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). > > Note that most of the line changes are from new tests. > > ### The Problem > There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. > > #### An Inner Loop without Parse Predicates > The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: > > ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) > > We only have Parse Predicates for the outer loop. Why? > > Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: > > ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) > > In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. > > For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. > > #### Predicates from a Folded Loop End up at Another Loop > As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is only running for a single iteration and the bac... Thanks Emanuel for your review! Forgot to move this to draft state. As Roland has pointed out, it is quite fragile to do a matching during IGVN where you need to handle all kinds of of dying predicate shapes. I'm currently moving to a non-IGVN solution (https://github.com/openjdk/jdk/pull/23941 is a first step). I will then update this patch. The problem to fix is still the same though, just with a different solution. I will get to this once I have integrated some preparatory changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23823#issuecomment-2706018405 From chagedorn at openjdk.org Wed Mar 19 14:36:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 14:36:29 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Thu, 27 Feb 2025 16:46:06 GMT, Roland Westrelin wrote: >> The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). >> >> Note that most of the line changes are from new tests. >> >> ### The Problem >> There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. >> >> #### An Inner Loop without Parse Predicates >> The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) >> >> We only have Parse Predicates for the outer loop. Why? >> >> Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) >> >> In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. >> >> For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. >> >> #### Predicates from a Folded Loop End up at Another Loop >> As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is... > > That looks reasonable to me but imaking sure that predicates in the process of being removed are properly stepped over feels like something that could be fragile. So I'm wondering if there would be a way to mark predicates as being for a particular loop (maybe storing the loop's node id they apply to in predicate nodes and making sure it's properly updated as loops are cloned etc.) so when there is a mismatch between the loop and predicate it can be detected? @rwestrel @eme64 I pushed an updated and added a new section `New Proposed Solution` in the PR description to explain the changes. I completely reverted the original IGVN based approach and implemented a new one inside `PhaseIdealLoop::eliminate_useless_predicates()`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23823#issuecomment-2736865920 From chagedorn at openjdk.org Wed Mar 19 14:36:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 14:36:30 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Wed, 19 Mar 2025 14:33:32 GMT, Christian Hagedorn wrote: >> The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). >> >> Note that most of the line changes are from new tests. >> >> ### The Problem >> There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. >> >> #### An Inner Loop without Parse Predicates >> The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) >> >> We only have Parse Predicates for the outer loop. Why? >> >> Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) >> >> In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. >> >> For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. >> >> #### Predicates from a Folded Loop End up at Another Loop >> As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Small things > - Fix test comments > - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. > - Merge branch 'master' into JDK-8350579 > - Revert fix completely > - 8350579: Remove Template Assertion Predicates belonging to a > loop once it is folded away during IGVN src/hotspot/share/opto/loopTransform.cpp line 1703: > 1701: // to the new stride. > 1702: void PhaseIdealLoop::update_main_loop_assertion_predicates(CountedLoopNode* main_loop_head) { > 1703: Node* init = main_loop_head->init_trip(); unused src/hotspot/share/opto/loopTransform.cpp line 2025: > 2023: loop_head->clear_strip_mined(); > 2024: > 2025: update_main_loop_assertion_predicates(clone_head, stride_con); Moved down here (see PR description). src/hotspot/share/opto/predicates.cpp line 182: > 180: > 181: // Clone this Template Assertion Predicate without modifying any OpaqueLoop*Node inputs. > 182: TemplateAssertionPredicate TemplateAssertionPredicate::clone(Node* new_control, CountedLoopNode* new_loop_node, Most changes in this file: Changing `phase` -> `_phase` and piping the new `CountedLoopNode` through the code such that we can initialized the new `OpaqueTemplateAssertionPredicate` nodes with them accordingly. src/hotspot/share/opto/predicates.cpp line 218: > 216: // This class is used to replace the input to OpaqueLoopStrideNode with a new node while leaving the other nodes > 217: // unchanged. > 218: class ReplaceOpaqueStrideInput : public BFSActions { Moved this up because we now call it from `TemplateAssertionPredicate` and not `TempalteAssertionPredicateExpression`. src/hotspot/share/opto/predicates.cpp line 218: > 216: DEBUG_ONLY(verify();) > 217: TemplateAssertionExpression expression(opaque_node()); > 218: expression.replace_opaque_stride_input(new_stride, igvn); Was just an indirection which I removed which makes it easier to use. src/hotspot/share/opto/predicates.cpp line 580: > 578: new OpaqueTemplateAssertionPredicateNode(bool_into_opaque_node_clone, new_loop_node); > 579: _phase->C->add_template_assertion_predicate_opaque(opaque_clone); > 580: _phase->register_new_node(opaque_clone, new_control); We don't clone the `OpaqueTemplateAssertionPredicateNode` anymore with `DataNodeGraph::clone_with_opaque_loop_transform_strategy` but directly here. This allows us to easily set the `new_loop_node` for it. src/hotspot/share/opto/predicates.cpp line 1101: > 1099: const TemplateAssertionPredicate& template_assertion_predicate) { > 1100: TemplateAssertionPredicate cloned_template_assertion_predicate = > 1101: template_assertion_predicate.clone(_old_target_loop_entry, _target_loop_head->as_CountedLoop(), _phase); Moved method from `.hpp` file here because of incomplete `CountedLoopNode` when calling `as_CountedLoop()`. src/hotspot/share/opto/predicates.cpp line 1151: > 1149: } > 1150: replace_opaque_stride_input(template_assertion_predicate); > 1151: template_assertion_predicate.update_associated_loop_node(_loop_node); We don't need to clone the Template Assertion Expression and hence we only need to update the loop node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2003442970 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2003443681 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2003456825 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2003450551 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2003451847 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2003460961 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2003476267 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2003480253 From chagedorn at openjdk.org Wed Mar 19 14:36:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 14:36:30 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Fri, 7 Mar 2025 09:35:19 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Small things >> - Fix test comments >> - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. >> - Merge branch 'master' into JDK-8350579 >> - Revert fix completely >> - 8350579: Remove Template Assertion Predicates belonging to a >> loop once it is folded away during IGVN > > test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java line 105: > >> 103: * @test id=NoFlags >> 104: * @bug 8288981 8350579 >> 105: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+AbortVMOnCompilationFailure > > Can you explain why you are enabling `AbortVMOnCompilationFailure`? I added an additional comment further up in the test to explain the reason. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2003439418 From chagedorn at openjdk.org Wed Mar 19 14:36:31 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 14:36:31 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Fri, 7 Mar 2025 09:38:42 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java line 159: >> >>> 157: >>> 158: // Runs most of the tests except the really time-consuming ones. >>> 159: static void runAllTests() { >> >> Sounds like a bit of a contradiction ? >> >> `runAllTests` -> `runAllFastTests`? > > Which ones are the really time-consuming ones? And why do you not run them here? I removed the comment for now - we run all tests here. This only applied to the full test added with JDK-8350577 from which I extracted these bits. I can revisit this comment and method name there again :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2003441045 From chagedorn at openjdk.org Wed Mar 19 15:13:18 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 15:13:18 GMT Subject: RFR: 8352131: [REDO] C2: Print compilation bailouts with PrintCompilation compile command Message-ID: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> We currently only print a compilation bailout with `-XX:+PrintCompilation`: 7782 90 b 4 Test::main (19 bytes) 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. The original fix with https://github.com/openjdk/jdk/pull/24031 missed the following: We release the memory for the directives here: https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2343 and then wrongly accessed the memory again to fetch `PrintCompilationOption` on this line: https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2377 This worked most of the time because the memory was not overridden, yet, but of course is a wrong use after free case. We only noticed this in some intermittent test failures where we suddenly dumped `COMPILE SKIPPED` where it was unexpected for tests. I now moved the memory release down after we access it for `PrintCompilationOption`. Thanks, Christian ------------- Commit messages: - 8352131: [REDO] C2: Print compilation bailouts with PrintCompilation compile command Changes: https://git.openjdk.org/jdk/pull/24117/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24117&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352131 Stats: 4 lines in 1 file changed: 2 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24117.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24117/head:pull/24117 PR: https://git.openjdk.org/jdk/pull/24117 From thartmann at openjdk.org Wed Mar 19 15:32:12 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 19 Mar 2025 15:32:12 GMT Subject: RFR: 8352131: [REDO] C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> References: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> Message-ID: On Wed, 19 Mar 2025 15:08:56 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > The original fix with https://github.com/openjdk/jdk/pull/24031 missed the following: We release the memory for the directives here: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2343 > > and then wrongly accessed the memory again to fetch `PrintCompilationOption` on this line: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2377 > > This worked most of the time because the memory was not overridden, yet, but of course is a wrong use after free case. We only noticed this in some intermittent test failures where we suddenly dumped `COMPILE SKIPPED` where it was unexpected for tests. > > I now moved the memory release down after we access it for `PrintCompilationOption`. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24117#pullrequestreview-2698982550 From chagedorn at openjdk.org Wed Mar 19 16:05:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 16:05:09 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v4] In-Reply-To: <5_7_WclfMhPtMey2k2Ty5ryRx3PTuqLNyG7kjWlEOlA=.338d31a7-1446-4864-9a77-acce761efd31@github.com> References: <5_7_WclfMhPtMey2k2Ty5ryRx3PTuqLNyG7kjWlEOlA=.338d31a7-1446-4864-9a77-acce761efd31@github.com> Message-ID: On Wed, 12 Mar 2025 16:08:14 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Some small comments, otherwise, it looks good to me. test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java line 31: > 29: * @run driver compiler.c2.TestAndConZeroCCP > 30: */ > 31: package compiler.c2; You should update this to `ccp` now that you moved the test and use `main` instead of `driver`. Otherwise, `@run driver` is never executed with additionally passed in flags, for example in higher tier. Suggestion: * @run main/othervm -Xbatch -XX:-TieredCompilation compiler.ccp.TestAndConZeroCCP * @run driver compiler.ccp.TestAndConZeroCCP */ package compiler.ccp; test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java line 37: > 35: public class TestAndConZeroCCP { > 36: > 37: public static void main(String[] args) { You should use a 4 space indentation for Java tests. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23871#pullrequestreview-2699111257 PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r2003703218 PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r2003708656 From chagedorn at openjdk.org Wed Mar 19 16:08:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 16:08:06 GMT Subject: RFR: 8352131: [REDO] C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> References: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> Message-ID: On Wed, 19 Mar 2025 15:08:56 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > The original fix with https://github.com/openjdk/jdk/pull/24031 missed the following: We release the memory for the directives here: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2343 > > and then wrongly accessed the memory again to fetch `PrintCompilationOption` on this line: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2377 > > This worked most of the time because the memory was not overridden, yet, but of course is a wrong use after free case. We only noticed this in some intermittent test failures where we suddenly dumped `COMPILE SKIPPED` where it was unexpected for tests. > > I now moved the memory release down after we access it for `PrintCompilationOption`. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24117#issuecomment-2737215042 From cushon at openjdk.org Wed Mar 19 16:11:35 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 19 Mar 2025 16:11:35 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v5] In-Reply-To: References: Message-ID: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. Liam Miller-Cushon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into JDK-8350563 - Reformat test and update package to ccp - Review comments - Update test/hotspot/jtreg/compiler/c2/TestAndConZeroCCP.java Co-authored-by: Emanuel Peter - copyright - style - Merge branch 'openjdk:master' into mernst/JDK-8350563 - RegTest - Merge branch 'openjdk:master' into mernst/JDK-8350563 - push `con->(cast*)->and` uses ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23871/files - new: https://git.openjdk.org/jdk/pull/23871/files/f28c1d46..291b611a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=03-04 Stats: 43477 lines in 622 files changed: 20451 ins; 15008 del; 8018 mod Patch: https://git.openjdk.org/jdk/pull/23871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23871/head:pull/23871 PR: https://git.openjdk.org/jdk/pull/23871 From cushon at openjdk.org Wed Mar 19 16:11:35 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 19 Mar 2025 16:11:35 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v4] In-Reply-To: References: <5_7_WclfMhPtMey2k2Ty5ryRx3PTuqLNyG7kjWlEOlA=.338d31a7-1446-4864-9a77-acce761efd31@github.com> Message-ID: On Wed, 19 Mar 2025 15:59:20 GMT, Christian Hagedorn wrote: >> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java line 31: > >> 29: * @run driver compiler.c2.TestAndConZeroCCP >> 30: */ >> 31: package compiler.c2; > > You should update this to `ccp` now that you moved the test and use `main` instead of `driver`. Otherwise, `@run driver` is never executed with additionally passed in flags, for example in higher tier. > Suggestion: > > * @run main/othervm -Xbatch -XX:-TieredCompilation compiler.ccp.TestAndConZeroCCP > * @run driver compiler.ccp.TestAndConZeroCCP > */ > package compiler.ccp; Done > test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java line 37: > >> 35: public class TestAndConZeroCCP { >> 36: >> 37: public static void main(String[] args) { > > You should use a 4 space indentation for Java tests. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r2003722869 PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r2003722755 From cushon at openjdk.org Wed Mar 19 16:18:33 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 19 Mar 2025 16:18:33 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v6] In-Reply-To: References: Message-ID: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23871/files - new: https://git.openjdk.org/jdk/pull/23871/files/291b611a..8554ea87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23871/head:pull/23871 PR: https://git.openjdk.org/jdk/pull/23871 From chagedorn at openjdk.org Wed Mar 19 16:18:35 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 16:18:35 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v5] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 16:11:35 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8350563 > - Reformat test and update package to ccp > - Review comments > - Update test/hotspot/jtreg/compiler/c2/TestAndConZeroCCP.java > > Co-authored-by: Emanuel Peter > - copyright > - style > - Merge branch 'openjdk:master' into mernst/JDK-8350563 > - RegTest > - Merge branch 'openjdk:master' into mernst/JDK-8350563 > - push `con->(cast*)->and` uses test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java line 29: > 27: * @summary Test that And nodes are added to the CCP worklist if they have a constant as input. > 28: * @run main/othervm -Xbatch -XX:-TieredCompilation compiler.ccp.TestAndConZeroCCP > 29: * @run driver compiler.ccp.TestAndConZeroCCP Forgot the 2nd update in the suggestion: Suggestion: * @run main compiler.ccp.TestAndConZeroCCP ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23871#discussion_r2003740004 From epeter at openjdk.org Wed Mar 19 16:20:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Mar 2025 16:20:12 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value [v2] In-Reply-To: References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: <9Pcu-Z727pHCWLrn45HKnsOZNH0ZtvsyhUXb0UQ9xB8=.5d541b21-cbe3-49dd-a0a1-20d83209b45e@github.com> On Tue, 18 Mar 2025 05:42:18 GMT, Xiaohong Gong wrote: >> Looks good. > > Hi @adinn , test has been updated. Thanks for your reviewing! > Hi @chhagedorn could you please help to take a look at this PR? Thanks a lot! @XiaohongGong thanks for looking at this! Patch looks reasonable. I'll launch some tests. @jatin-bhateja Can you also have a quick look at this, you originally wrote some of the VM code here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24051#issuecomment-2737258407 From kvn at openjdk.org Wed Mar 19 16:28:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 19 Mar 2025 16:28:06 GMT Subject: RFR: 8352131: [REDO] C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> References: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> Message-ID: On Wed, 19 Mar 2025 15:08:56 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > The original fix with https://github.com/openjdk/jdk/pull/24031 missed the following: We release the memory for the directives here: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2343 > > and then wrongly accessed the memory again to fetch `PrintCompilationOption` on this line: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2377 > > This worked most of the time because the memory was not overridden, yet, but of course is a wrong use after free case. We only noticed this in some intermittent test failures where we suddenly dumped `COMPILE SKIPPED` where it was unexpected for tests. > > I now moved the memory release down after we access it for `PrintCompilationOption`. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24117#pullrequestreview-2699230579 From bulasevich at openjdk.org Wed Mar 19 16:58:08 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 19 Mar 2025 16:58:08 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 22:23:23 GMT, Evgeny Astigeevich wrote: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... Hi Evgeny, I ran the benchmark on my machines (s390 and riscv64 are virtual machines, so I do not trust them much). The result is that code sparsity affects performance on Graviton4, s390 and POWER9. RISC-V in my hands does not care about code sparsity, but shows dramatic degradation as the amount of code increases. Here is the raw data: .. | ? | G4 | ? | S390 | ? | POWER9 | ? | riscv64 | ? -- | -- | -- | -- | -- | -- | -- | -- | -- | -- activeMethodCount | groupCount | us/op | ? | us/op | ? | us/op | ? | us/op | ? 128 | 1 | 11.972 | 0.004 | 21.577 | 0.042 | 27.585 | 0.749 | 108.109 | 0.669 128 | 32 | 13.622 | 0.092 | 24.762 | 0.149 | 34.682 | 2.468 | 107.09 | 0.507 128 | 48 | 13.217 | 0.072 | 25.094 | 0.014 | 35.657 | 0.913 | 108.862 | 0.43 128 | 64 | 13.668 | 0.04 | 25.581 | 0.056 | 34.857 | 0.841 | 109.416 | 0.258 128 | 80 | 13.986 | 0.127 | 25.74 | 0.071 | 36.264 | 0.873 | 110.196 | 0.29 128 | 96 | 14.594 | 0.055 | 26.033 | 0.058 | 36.734 | 0.672 | 111.411 | 0.602 128 | 112 | 14.77 | 0.078 | 27.594 | 0.033 | 36.482 | 1.513 | 112.238 | 0.6 256 | 1 | 23.998 | 0.019 | 45.146 | 0.131 | 68.831 | 1.058 | 224.967 | 0.392 256 | 32 | 26.273 | 0.036 | 52.402 | 0.038 | 71.686 | 4.776 | 217.511 | 1.667 256 | 48 | 26.61 | 0.063 | 52.949 | 0.317 | 70.867 | 2.41 | 220.549 | 0.41 256 | 64 | 26.959 | 0.085 | 53.824 | 0.367 | 72.771 | 1.423 | 220.805 | 0.952 256 | 80 | 27.646 | 0.089 | 53.927 | 1.035 | 73.949 | 2.102 | 220.814 | 0.498 256 | 96 | 27.829 | 0.128 | 54.665 | 0.029 | 75.791 | 3.527 | 222.571 | 0.875 256 | 112 | 28.298 | 0.064 | 53.902 | 0.237 | 75.996 | 3.266 | 224.626 | 1.752 512 | 1 | 48.181 | 0.032 | 88.372 | 0.299 | 147.922 | 7.862 | 487.557 | 1.454 512 | 32 | 53.157 | 0.044 | 108.089 | 0.124 | 151.998 | 3.999 | 462.369 | 0.917 512 | 48 | 55.13 | 0.052 | 109.149 | 0.77 | 160.646 | 28.419 | 456.265 | 1.198 512 | 64 | 56.609 | 0.123 | 110.346 | 0.729 | 158.9 | 16.885 | 464.728 | 3.811 512 | 80 | 57.146 | 0.091 | 110.808 | 0.295 | 157.446 | 11.494 | 454.655 | 4.941 512 | 96 | 59.038 | 0.092 | 111.117 | 0.101 | 154.412 | 5.113 | 465.095 | 1.281 512 | 112 | 60.647 | 0.331 | 110.216 | 0.153 | 155.93 | 9.2 | 489.859 | 0.988 768 | 1 | 77.086 | 0.402 | 139.595 | 0.839 | 191.497 | 6.112 | 1998.335 | 5729.012 768 | 32 | 89.599 | 0.14 | 159.535 | 0.816 | 230.192 | 2.105 | 1663.619 | 5404.593 768 | 48 | 94.312 | 0.33 | 164.865 | 0.493 | 234.917 | 12.344 | 1737.604 | 5687.615 768 | 64 | 94.243 | 0.218 | 166.708 | 0.498 | 234.764 | 10.555 | 1717.1 | 5527.53 768 | 80 | 95.566 | 0.068 | 167.759 | 0.067 | 235.179 | 9.158 | 1732.491 | 5585.148 768 | 96 | 99.435 | 0.323 | 168.27 | 0.201 | 232.356 | 5.571 | 1926.957 | 6162.978 768 | 112 | 105.814 | 0.366 | 167.955 | 0.188 | 234.879 | 4.964 | 1876.117 | 6096.535 1024 | 1 | 110.407 | 1.27 | 198.679 | 1.541 | 251.436 | 14.05 | 6632.683 | 4073.64 1024 | 32 | 137.626 | 1.62 | 215.316 | 0.422 | 290.579 | 8.847 | 6546.788 | 3998.059 1024 | 48 | 141.191 | 0.372 | 216.638 | 1.415 | 295.236 | 21.935 | 6523.087 | 4009.421 1024 | 64 | 141.227 | 0.238 | 218.441 | 1.636 | 299.975 | 5.916 | 6356.841 | 4165.066 1024 | 80 | 148.555 | 0.157 | 220.563 | 0.21 | 298.32 | 11.28 | 6321.32 | 4617.812 1024 | 96 | 155.47 | 0.321 | 218.799 | 0.431 | 298.863 | 18.88 | 6431.995 | 4325.676 1024 | 112 | 158.288 | 0.568 | 219.812 | 0.955 | 290.01 | 8.248 | 6262.742 | 4558.472 And let me post some pictures. Here is the most simple and evident one. It says: sparsity matters. We observe approximately a 20% performance degradation on AArch, S390, and Power9 when we split methods into 32/128 distant groups. ![image](https://github.com/user-attachments/assets/1c46b5da-6c88-4f57-8955-c82658bb512b) Here is a broader picture. I normalized the data to align different platforms and compare the time per single method call. We see that sparsity matters, and the amount of code is also important. I do not include my RISC-V results here, as its performance behaves erratically as the amount of code increases. And I believe this is not a real effect but rather a peculiarity of the virtual machine. ![image](https://github.com/user-attachments/assets/f1943df2-0156-4186-b4b0-f0e98e816aba) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2737391773 From duke at openjdk.org Wed Mar 19 17:17:24 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 19 Mar 2025 17:17:24 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v5] In-Reply-To: <2VksBtd_XqgUIQpirjTmAkXUpVZPtahtmLfIoEVRC0A=.895aa101-0baa-461c-970d-b95f146a4f9a@github.com> References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> <2VksBtd_XqgUIQpirjTmAkXUpVZPtahtmLfIoEVRC0A=.895aa101-0baa-461c-970d-b95f146a4f9a@github.com> Message-ID: On Mon, 17 Mar 2025 14:08:37 GMT, Andrew Dinn wrote: >> This PR reworks the existing AArch64 ML_DSA intrinsic code generator to make it clearer to read and easier to maintain. > > Andrew Dinn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - use references and const to avoid VSeq copying and fix int array arg issue > - fix comment > - fix invalid register argument > - fix errors in comments > - fix whitespace errors > - Clearer implementation of AArch64 dilithium generator It all looks great to me, the comments are good (a little too verbose for my taste, but they explain well what is happening in the code). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24026#issuecomment-2737446124 From kvn at openjdk.org Wed Mar 19 17:21:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 19 Mar 2025 17:21:08 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 00:03:19 GMT, Dean Long wrote: >> Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. >> With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. >> >> There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). >> >> Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. > > src/hotspot/share/code/codeBlob.cpp line 156: > >> 154: } else { >> 155: // We need unique and valid not null address >> 156: _mutable_data = blob_end(); > > It makes me a little nervous pointing this value to real data. When RelocIterator computes `_current = nm->relocation_begin() - 1`, it should never read or write from that address, but how can we guarantee that? Any non-null address that is guarateed unmapped would do, or a special protetected page like `bad_page` here: https://github.com/openjdk/jdk/blob/8e530633a9d99d7ce585cafd5573cb89212feee7/src/hotspot/share/runtime/safepointMechanism.cpp#L66. If using protected memory seems like overkill, then I suggest using a static. Something like this: > > static union { > relocInfo _dummy[1]; > } _empty[2]; > [...] > _mutable_data = _empty+1; > > However, I think this is not the first time we have run into this issue with RelocIterator. Maybe it's time that we rewrote it to avoid this situation? How about this?: +++ b/src/hotspot/share/code/relocInfo.cpp @@ -117,6 +117,8 @@ void relocInfo::change_reloc_info_for_address(RelocIterator *itr, address pc, re // Implementation of RelocIterator +static relocInfo dummy_reloc[2]; + void RelocIterator::initialize(nmethod* nm, address begin, address limit) { initialize_misc(); @@ -127,8 +129,14 @@ void RelocIterator::initialize(nmethod* nm, address begin, address limit) { guarantee(nm != nullptr, "must be able to deduce nmethod from other arguments"); _code = nm; - _current = nm->relocation_begin() - 1; - _end = nm->relocation_end(); + // Check for no relocations case and use dummy data to avoid referencing wrong data. + if (nm->relocation_size() == 0) { + _current = dummy_reloc; + _end = dummy_reloc + 1; + } else { + _current = nm->relocation_begin() - 1; + _end = nm->relocation_end(); + } _addr = nm->content_begin(); // Initialize code sections. I filed RFE: [JDK-8352426](https://bugs.openjdk.org/browse/JDK-8352426) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24102#discussion_r2003871579 From adinn at openjdk.org Wed Mar 19 17:26:12 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 19 Mar 2025 17:26:12 GMT Subject: RFR: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 [v5] In-Reply-To: References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> <2VksBtd_XqgUIQpirjTmAkXUpVZPtahtmLfIoEVRC0A=.895aa101-0baa-461c-970d-b95f146a4f9a@github.com> Message-ID: <6GwUAOIqx7uhZNEjds4FR9L0SHVHLOd3xT2h1qNOTn0=.14c3ceb6-b7f8-4282-ac76-416cc752d715@github.com> On Wed, 19 Mar 2025 17:14:08 GMT, Ferenc Rakoczi wrote: >> Andrew Dinn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - use references and const to avoid VSeq copying and fix int array arg issue >> - fix comment >> - fix invalid register argument >> - fix errors in comments >> - fix whitespace errors >> - Clearer implementation of AArch64 dilithium generator > > It all looks great to me, the comments are good (a little too verbose for my taste, but they explain well what is happening in the code). @ferakocz Thanks. I'll integrate this now. I'll also propose a restructuring of the ML_KEM implementation using VSeq as part of a review of that PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24026#issuecomment-2737470277 From adinn at openjdk.org Wed Mar 19 17:26:13 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 19 Mar 2025 17:26:13 GMT Subject: Integrated: 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 In-Reply-To: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> References: <84NyPLCnDh5IomQelwjeDs6ihk3NS7gcsEhOR_tBw5E=.b8c70ebe-d714-4dca-a30f-6b3c86161e23@github.com> Message-ID: <4yUFM12VPBUIqVlSHnWyM4L85GHpLH5K0w4TCcgcfM4=.b68e3833-976f-43ab-b303-9dd7c0f98f18@github.com> On Thu, 13 Mar 2025 08:57:18 GMT, Andrew Dinn wrote: > This PR reworks the existing AArch64 ML_DSA intrinsic code generator to make it clearer to read and easier to maintain. This pull request has now been integrated. Changeset: ac3ad03a Author: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/ac3ad03a3f946fbff147732c5f403c8dc445eed8 Stats: 983 lines in 3 files changed: 399 ins; 304 del; 280 mod 8350589: Investigate cleaner implementation of AArch64 ML-DSA intrinsic introduced in JDK-8348561 Reviewed-by: dlong ------------- PR: https://git.openjdk.org/jdk/pull/24026 From kvn at openjdk.org Wed Mar 19 17:52:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 19 Mar 2025 17:52:46 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: > Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. > With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. > > There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). > > Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Update field default setting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24102/files - new: https://git.openjdk.org/jdk/pull/24102/files/d8b8cf31..cc14ad2a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24102&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24102&range=00-01 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24102.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24102/head:pull/24102 PR: https://git.openjdk.org/jdk/pull/24102 From eastigeevich at openjdk.org Wed Mar 19 18:15:08 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 19 Mar 2025 18:15:08 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 16:55:56 GMT, Boris Ulasevich wrote: >> This benchmark is used to check performance impact of the code cache being sparse. >> >> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. >> >> Results: code region size 2M (2097152) bytes >> - Intel Xeon Platinum 8259CL >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | >> |--- |--- |--- |--- |--- |--- |--- | >> |128 |1 |128 |19.577 |0.619 |us/op | | >> |128 |32 |4 |22.968 |0.314 |us/op |17.30% | >> |128 |48 |3 |22.245 |0.388 |us/op |13.60% | >> |128 |64 |2 |23.874 |0.84 |us/op |21.90% | >> |128 |80 |2 |23.786 |0.231 |us/op |21.50% | >> |128 |96 |1 |26.224 |1.16 |us/op |34% | >> |128 |112 |1 |27.028 |0.461 |us/op |38.10% | >> |256 |1 |256 |47.43 |1.146 |us/op | | >> |256 |32 |8 |63.962 |1.671 |us/op |34.90% | >> |256 |48 |5 |63.396 |0.247 |us/op |33.70% | >> |256 |64 |4 |66.604 |2.286 |us/op |40.40% | >> |256 |80 |3 |59.746 |1.273 |us/op |26% | >> |256 |96 |3 |63.836 |1.034 |us/op |34.60% | >> |256 |112 |2 |63.538 |1.814 |us/op |34% | >> |512 |1 |512 |172.731 |4.409 |us/op | | >> |512 |32 |16 |206.772 |6.229 |us/op |19.70% | >> |512 |48 |11 |215.275 |2.228 |us/op |24.60% | >> |512 |64 |8 |212.962 |2.028 |us/op |23.30% | >> |512 |80 |6 |201.335 |12.519 |us/op |16.60% | >> |512 |96 |5 |198.133 |6.502 |us/op |14.70% | >> |512 |112 |5 |193.739 |3.812 |us/op |12.20% | >> |768 |1 |768 |325.154 |5.048 |us/op | | >> |768 |32 |24 |346.298 |20.196 |us/op |6.50% | >> |768 |48 |16 |350.746 |2.931 |us/op |7.90% | >> |768 |64 |12 |339.445 |7.927 |us/op |4.40% | >> |768 |80 |10 |347.408 |7.355 |us/op |6.80% | >> |768 |96 |8 |340.983 |3.578 |us/op |4.90% | >> |768 |112 |7 |353.949 |2.98 |us/op |8.90% | >> |1024 |1 |1024 |368.352 |5.961 |us/op | | >> |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | >> |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | >> |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | >> |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | >> |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | >> |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | >> >> - AArch64 Neoverse N1 >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff |... > > Hi Evgeny, > > I ran the benchmark on my machines (s390 and riscv64 are virtual machines, so I do not trust them much). The result is that code sparsity affects performance on Graviton4, s390 and POWER9. RISC-V in my hands does not care about code sparsity, but shows dramatic degradation as the amount of code increases. > > Here is the raw data: > > .. | ? | G4 | ? | S390 | ? | POWER9 | ? | riscv64 | ? > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > activeMethodCount | groupCount | us/op | ? | us/op | ? | us/op | ? | us/op | ? > 128 | 1 | 11.972 | 0.004 | 21.577 | 0.042 | 27.585 | 0.749 | 108.109 | 0.669 > 128 | 32 | 13.622 | 0.092 | 24.762 | 0.149 | 34.682 | 2.468 | 107.09 | 0.507 > 128 | 48 | 13.217 | 0.072 | 25.094 | 0.014 | 35.657 | 0.913 | 108.862 | 0.43 > 128 | 64 | 13.668 | 0.04 | 25.581 | 0.056 | 34.857 | 0.841 | 109.416 | 0.258 > 128 | 80 | 13.986 | 0.127 | 25.74 | 0.071 | 36.264 | 0.873 | 110.196 | 0.29 > 128 | 96 | 14.594 | 0.055 | 26.033 | 0.058 | 36.734 | 0.672 | 111.411 | 0.602 > 128 | 112 | 14.77 | 0.078 | 27.594 | 0.033 | 36.482 | 1.513 | 112.238 | 0.6 > 256 | 1 | 23.998 | 0.019 | 45.146 | 0.131 | 68.831 | 1.058 | 224.967 | 0.392 > 256 | 32 | 26.273 | 0.036 | 52.402 | 0.038 | 71.686 | 4.776 | 217.511 | 1.667 > 256 | 48 | 26.61 | 0.063 | 52.949 | 0.317 | 70.867 | 2.41 | 220.549 | 0.41 > 256 | 64 | 26.959 | 0.085 | 53.824 | 0.367 | 72.771 | 1.423 | 220.805 | 0.952 > 256 | 80 | 27.646 | 0.089 | 53.927 | 1.035 | 73.949 | 2.102 | 220.814 | 0.498 > 256 | 96 | 27.829 | 0.128 | 54.665 | 0.029 | 75.791 | 3.527 | 222.571 | 0.875 > 256 | 112 | 28.298 | 0.064 | 53.902 | 0.237 | 75.996 | 3.266 | 224.626 | 1.752 > 512 | 1 | 48.181 | 0.032 | 88.372 | 0.299 | 147.922 | 7.862 | 487.557 | 1.454 > 512 | 32 | 53.157 | 0.044 | 108.089 | 0.124 | 151.998 | 3.999 | 462.369 | 0.917 > 512 | 48 | 55.13 | 0.052 | 109.149 | 0.77 | 160.646 | 28.419 | 456.265 | 1.198 > 512 | 64 | 56.609 | 0.123 | 110.346 | 0.729 | 158.9 | 16.885 | 464.728 | 3.811 > 512 | 80 | 57.146 | 0.091 | 110.808 | 0.295 | 157.446 | 11.494 | 454.655 | 4.941 > 512 | 96 | 59.038 | 0.092 | 111.117 | 0.101 | 154.412 | 5.113 | 465.095 | 1.281 > 512 | 112 | 60.647 | 0.331 | 110.216 | 0.153 | 155.93 | 9.2 | 489.859 | 0.988 > 768 | 1 | 77.086 | 0.402 | 139.595 | 0.839 | 191.497 | 6.112 | 1998.335 | 5729.012 > 768 | 32 | 89.599 | 0.14 | 159.535 | 0.816 | 230.192 | 2.105 | 1663.619 | 5404.593 > 768 | 48 | 94.312 | 0.33 | 164.865 | 0.493 | 234.917 | 12.344 | 1737.604 | 5687.615 > 768 | 64 | 94.243 | 0.218 | 166.708 | 0.498 | 234.764 | 10.555 | 1717.1 | 5527.53 > 768 | 8... Hi @bulasevich, Thank you for the data very much. They are very useful. I am planning to add some changes to the benchmark: 1. Address Vladimir's comments about having different size nmethods. 2. Add calling of methods without reflection: static calls, vtable calls, itable calls. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2737596648 From bulasevich at openjdk.org Wed Mar 19 18:23:09 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 19 Mar 2025 18:23:09 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 22:23:23 GMT, Evgeny Astigeevich wrote: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... test/micro/org/openjdk/bench/vm/compiler/SparseCodeCache.java line 96: > 94: private static Object WB; > 95: > 96: @Param({"128", "256", "512", "768", "1024"}) Big number of parameters is good for research purposes, but I think we should limit it for those people who run full microbenchmarks set to find regressions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23831#discussion_r2003982756 From bulasevich at openjdk.org Wed Mar 19 18:38:15 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 19 Mar 2025 18:38:15 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 22:23:23 GMT, Evgeny Astigeevich wrote: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... test/micro/org/openjdk/bench/vm/compiler/SparseCodeCache.java line 280: > 278: var lastNmethodInPrevGroup = methods[j - 1].getNMethod(); > 279: if ((lastNmethodInPrevGroup.address + lastNmethodInPrevGroup.size) < regionEnd) { > 280: getWhiteBox().allocateCodeBlob(regionEnd - lastNmethodInPrevGroup.address - lastNmethodInPrevGroup.size, I would add a comment here: // Here we assume that (1) the CodeCache is not fragmented, (2) other C2 threads and CodeCache cleaner // do not interfere heavily, and we rely on the fact that (3) methods are allocated from left to right. // Given this, we allocate a large big padding and expect the next compiled methods to be allocated in a subsequent codeRegion ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23831#discussion_r2004008513 From chagedorn at openjdk.org Wed Mar 19 20:07:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Mar 2025 20:07:09 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value [v2] In-Reply-To: References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: On Tue, 18 Mar 2025 03:51:55 GMT, Xiaohong Gong wrote: >> The following assertion fails on AArch64: >> >> >> Internal Error (jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2991), pid=3822987, tid=3823007 >> assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value >> >> >> with a simple Vector API case: >> >> public static IntVector test() { >> IntVector iv = IntVector.zero(IntVector.SPECIES_128); >> return iv.lanewise(VectorOperators.ROR, iv); >> } >> >> >> On AArch64, vector `ROR/ROL` (rotate right/left) operations are implemented with a combination of shifts. Please see the pattern for `ROR`: >> >> >> lsr dst1, src, cnt // unsigned right shift >> lsl dst2, src, bitSize - cnt // left shift >> orr dst, dst1, dst2 // logical or >> >> where `bitSize` is the element type width (e.g. `32` for `int`). In above case, `cnt` is a zero constant, resulting in a left shift of 32 (`bitSize - 0`), which exceeds the instruction's valid shift count range and triggers the assertion. To fix this, we need to mask the shift count to ensure it stays within valid range when calculating shift counts for rotate operations: `shiftCnt = shiftCnt & (bitSize - 1)`. >> >> Note that the mask is only necessary for constant shift counts. This not only fixes the assertion failure, but also allows `ROR/ROL src, 0` to be optimized to `src` directly. >> >> For vector variables as shift counts, the masking can be safely omitted because: >> 1. Vector shift instructions that take a vector register as the shift count may not automatically apply modulo arithmetic based on register size. When the shift count is `32` for int type, the result may be either `zeros` or `src`. However, this doesn't affect correctness for rotate since the final result is combined with `src` using a logical `OR` operation. >> 2. It saves a vector logical `AND` for masking, which is friendly to the performance. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update the test case Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24051#pullrequestreview-2700036299 From sviswanathan at openjdk.org Wed Mar 19 23:48:10 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 19 Mar 2025 23:48:10 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 06:25:38 GMT, Jatin Bhateja wrote: >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 >> >> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). >> >> Summary of changes included with the patch: >> 1. C2 compiler New Vector IR creation. >> 2. Auto-vectorization support. >> 3. x86 backend implementation. >> 4. New IR verification test for each newly supported vector operation. >> >> Following are the performance numbers of Float16OperationsBenchmark >> >> System : Intel(R) Xeon(R) Processor code-named Granite rapids >> Frequency fixed at 2.5 GHz >> >> >> Baseline >> Benchmark (vectorDim) Mode Cnt Score Error Units >> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms >> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms >> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms >> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms >> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms >> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms >> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms >> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms >> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms >> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms >> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms >> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms >> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms >> Float16OperationsBenchmark.isInfiniteCMovBen... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 > - Updating benchmark > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 > - Updating copyright > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 > - Add MinVHF/MaxVHF to commutative op list > - Auto Vectorization support for Float16 operations. There is a test failure in GHA. A merge with master would be good. test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 27: > 25: /** > 26: * @test > 27: * @bug 8346236 Please include key randomness here. test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 221: > 219: public void checkResultFma() { > 220: for (int i = 0; i < LEN; ++i) { > 221: short expected = floatToFloat16(Math.fma(float16ToFloat(input1[i]), float16ToFloat(input2[i]), float16ToFloat(input3[i]))); The expected for fma should be either implemented on similar lines as Float16.fma() or we could call Float16.fma here directly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22755#issuecomment-2738538050 PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2004399256 PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2004483781 From dlong at openjdk.org Thu Mar 20 01:36:21 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 20 Mar 2025 01:36:21 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: <0pScaTfNpM4LYdukKYZIdk1BluQVvUM2BdssI4bjisI=.d92d627f-ce02-406c-bed3-38d31692c2ef@github.com> References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> <0pScaTfNpM4LYdukKYZIdk1BluQVvUM2BdssI4bjisI=.d92d627f-ce02-406c-bed3-38d31692c2ef@github.com> Message-ID: On Mon, 17 Mar 2025 18:05:11 GMT, Tom Rodriguez wrote: >> Block::_freq is number of times this block is executed per each call of the method. It could be big number for blocks in loop and very small on not frequent path. >> >> `succ_prob()` is calculated based on frequencies of two block and/or corresponding branch probability: [gcm.cpp#L2100](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/gcm.cpp#L2100) >> >> `freq = b->_freq * b->succ_prob(j)` is number of times we take this outgoing path. So to calculate probability of taking this path in `target` block we divide `freq` on number of times `target` block is executed. >> >> This assumes that `b->_freq` <= `target->freq`. Which seems not true in this case and indicate a bug in how we calculate and update blocks frequencies. > > I agree with Vladimir that it seems like something is wrong with the block probabilities. In product it would be fine to simply clamp these values in the range of 0..100 since they are just used to compute `CFGEdge::_infrequent` so the worst thing you get is a less good layout. Refactoring the expressions so it's more clear what the requirements wouldn't hurt either. There may be a bug in frequency propagation. I don't understand the connector/non-connector logic, but when I reproduce this, the successor has a loop block with high _freq, but then we use non_connector_successor() to get the successor, and that gives us instead a different block which originally had 0 _freq, but got changed to MIN_BLOCK_FREQUENCY by CFGLoop::scale_freq(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r2004619271 From xgong at openjdk.org Thu Mar 20 01:50:10 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 20 Mar 2025 01:50:10 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> References: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> Message-ID: On Thu, 13 Mar 2025 09:32:20 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'jdk:master' into JDK_8349522 >> - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations >> >> Since PR [1] has added several new vector operations in VectorAPI >> and the X86 backend implementation for them, this patch adds the >> AArch64 backend part for NEON/SVE architectures. >> >> The performance of Vector API relative jmh micro benchmarks can >> improve about 70x ~ 95x on an AArch64 128-bit vector length sve2 >> architecture with different UseSVE options. Here is the uplift >> details: >> >> ``` >> Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 >> ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x >> ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x >> ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x >> ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x >> ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x >> ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x >> ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x >> ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x >> ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x >> ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x >> ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x >> ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x >> IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x >> IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x >> IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x >> IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x >> IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x >> IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x >> IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x >> IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x ... > > I'm getting this failure with `-XX:UseAVX=1` on x64. It is a new test you added. > > Failed IR Rules (1) of Methods (1) > ---------------------------------- > 1) Method "public void compiler.vectorapi.VectorSaturatedOperationsTest.susub_masked()" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={BEFORE_MATCHING}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx", "true", "asimd", "true"}, counts={"_#V#SATURATING_SUB_VL#_", " >0 ", "unsigned_vector_node", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "Before matching": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(SaturatingSubV.*)+(\\s){2}===.*vector[A-Za-z])" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! > * Constraint 2: "unsigned_vector_node" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! Hi @eme64 , may I know what the test's status is? Seems the failure on linux-x64 is not caused by this PR. Is it possible to move on? Please let me know if any other issues. Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2738830986 From xgong at openjdk.org Thu Mar 20 02:11:12 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 20 Mar 2025 02:11:12 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:01:06 GMT, Xiaohong Gong wrote: >> Did you try without? The default warmup should be sufficient I think. But I could be wrong. > > Yes, actually it can pass without this sometimes. I'm afriad the IR test would fail in future, as I met the random failure issue before on other IR tests. I also checked some existing tests under vectorapi, almost all have either 5000 or 10000 warmup. Maybe I can use a smaller warmup like `5000`. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r2004667794 From dlong at openjdk.org Thu Mar 20 02:22:07 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 20 Mar 2025 02:22:07 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 17:52:46 GMT, Vladimir Kozlov wrote: >> Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. >> With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. >> >> There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). >> >> Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Update field default setting src/hotspot/share/code/codeBlob.cpp line 187: > 185: if (_mutable_data != blob_end()) { > 186: os::free(_mutable_data); > 187: _mutable_data = blob_end(); // Valid not null address Do we still need this to be a valid address after purge(), or can we set it to nullptr here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24102#discussion_r2004678576 From xgong at openjdk.org Thu Mar 20 02:46:48 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 20 Mar 2025 02:46:48 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v5] In-Reply-To: References: Message-ID: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Use a smaller warmup and array length in IR test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23790/files - new: https://git.openjdk.org/jdk/pull/23790/files/1cbff61f..5249c9ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23790&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23790&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23790/head:pull/23790 PR: https://git.openjdk.org/jdk/pull/23790 From xgong at openjdk.org Thu Mar 20 02:46:48 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 20 Mar 2025 02:46:48 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 07:50:51 GMT, Emanuel Peter wrote: >> Yes, I think we'd better use a larger warmup to make sure the vector api intrinsics are inlined in C2, so that the IR check can pass. > > Did you try without? The default warmup should be sufficient I think. But I could be wrong. Hi @eme64 , the warmup is changed to use `5000`. I also changed the array length to a smaller value. And no failure is met so far. Could you please kindly take a look at it again? Thanks a lot! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23790#discussion_r2004720536 From fyang at openjdk.org Thu Mar 20 03:54:40 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Mar 2025 03:54:40 GMT Subject: RFR: 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled Message-ID: Hi, please consider this small change. `UsePoly1305Intrinsics`, `UseMD5Intrinsics` and `UseSHA1Intrinsics` depend on `!AvoidUnalignedAccesses` and thus are unavailable on platforms with slow unaligned accesses. But these options could still be enabled on the command line, which I think could be suprising to our end users as these intrinsics will only have negative impact on performance numbers for such platforms. It seems to me more reasonable to print warnings and keep them disabled when enabled by the user on such platforms. ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseMD5Intrinsics -version OpenJDK 64-Bit Server VM warning: Intrinsics for MD5 crypto hash functions not available on this CPU. openjdk version "25-internal" 2025-09-16 OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UsePoly1305Intrinsics -version OpenJDK 64-Bit Server VM warning: Intrinsics for Poly1305 crypto hash functions not available on this CPU. openjdk version "25-internal" 2025-09-16 OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseSHA1Intrinsics -version OpenJDK 64-Bit Server VM warning: Intrinsics for SHA-1 crypto hash functions not available on this CPU. openjdk version "25-internal" 2025-09-16 OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) ------------- Commit messages: - 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled Changes: https://git.openjdk.org/jdk/pull/24123/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24123&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352477 Stats: 17 lines in 1 file changed: 9 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24123.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24123/head:pull/24123 PR: https://git.openjdk.org/jdk/pull/24123 From kvn at openjdk.org Thu Mar 20 05:23:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Mar 2025 05:23:07 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 02:19:43 GMT, Dean Long wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Update field default setting > > src/hotspot/share/code/codeBlob.cpp line 187: > >> 185: if (_mutable_data != blob_end()) { >> 186: os::free(_mutable_data); >> 187: _mutable_data = blob_end(); // Valid not null address > > Do we still need this to be a valid address after purge(), or can we set it to nullptr here? Then we will have to check for nullptr too I think. I prefer to have only 2 states of the field value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24102#discussion_r2004829688 From chagedorn at openjdk.org Thu Mar 20 06:08:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Mar 2025 06:08:09 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v6] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 16:18:33 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23871#pullrequestreview-2701367022 From epeter at openjdk.org Thu Mar 20 06:16:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 06:16:18 GMT Subject: RFR: 8352020: [CompileFramework] enable compilation for VectorAPI [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 17:14:55 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> typo > > Very good. @vnkozlov @chhagedorn thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24082#issuecomment-2739319770 From epeter at openjdk.org Thu Mar 20 06:16:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 06:16:19 GMT Subject: Integrated: 8352020: [CompileFramework] enable compilation for VectorAPI In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 15:53:17 GMT, Emanuel Peter wrote: > During work on [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942) I discovered that it is currently not possible to compile VectorAPI code because it is still in incubator mode and needs flag "--add-modules=jdk.incubator.vector" for "javac". > > Also: "javac" can produce warnings, and that leads to issues like this: [JDK-8351998](https://bugs.openjdk.org/browse/JDK-8351998). We should allow such warnings, they are not compile failures. > > Example: > > javac --add-modules=jdk.incubator.vector Test.java > warning: [incubating] using incubating module(s): jdk.incubator.vector > 1 warning > > > I added an example test as well. This pull request has now been integrated. Changeset: 3ed010ab Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/3ed010ab7cf5b8c9bf8fa000e88ea95285351982 Stats: 105 lines in 3 files changed: 99 ins; 0 del; 6 mod 8352020: [CompileFramework] enable compilation for VectorAPI Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24082 From epeter at openjdk.org Thu Mar 20 06:17:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 06:17:15 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 10:33:47 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > Looks good to me too. @TobiHartmann thanks for the review! @chhagedorn thanks for all the help and the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24049#issuecomment-2739318935 From epeter at openjdk.org Thu Mar 20 06:17:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 06:17:16 GMT Subject: Integrated: 8351952: [IR Framework]: allow ignoring methods that are not compilable In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 08:48:21 GMT, Emanuel Peter wrote: > With the Template Framework, I'm generating IR tests randomly. But random code can always hit bailouts in compilation, and make code not compilable any more. We should have a way to disable this check, and just gracefully continue to execute the tests. > > To allow a single test method to be `not compilable`: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L160-L161 > > To allow all test methods to be `not compilable`: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L140-L144 > > See also this documentation in the code: > https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/compiler/lib/ir_framework/Test.java#L88-L94 > > --------------------------------------- > > **Background** > > My random code seems to hit a bailout in the Register Allocator, and I cannot do much to predict if that bailout happens. > See https://bugs.openjdk.org/browse/JDK-8304328 This pull request has now been integrated. Changeset: fb210e3a Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/fb210e3a7174bca1da112216158b2c1dede6dc34 Stats: 442 lines in 17 files changed: 415 ins; 0 del; 27 mod 8351952: [IR Framework]: allow ignoring methods that are not compilable Co-authored-by: Christian Hagedorn Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24049 From xgong at openjdk.org Thu Mar 20 06:48:10 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 20 Mar 2025 06:48:10 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: References: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> Message-ID: On Thu, 20 Mar 2025 01:47:13 GMT, Xiaohong Gong wrote: >> I'm getting this failure with `-XX:UseAVX=1` on x64. It is a new test you added. >> >> Failed IR Rules (1) of Methods (1) >> ---------------------------------- >> 1) Method "public void compiler.vectorapi.VectorSaturatedOperationsTest.susub_masked()" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={BEFORE_MATCHING}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx", "true", "asimd", "true"}, counts={"_#V#SATURATING_SUB_VL#_", " >0 ", "unsigned_vector_node", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "Before matching": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(SaturatingSubV.*)+(\\s){2}===.*vector[A-Za-z])" >> - Failed comparison: [found] 0 > 0 [given] >> - No nodes matched! >> * Constraint 2: "unsigned_vector_node" >> - Failed comparison: [found] 0 > 0 [given] >> - No nodes matched! > > Hi @eme64 , may I know what the test's status is? Seems the failure on linux-x64 is not caused by this PR. Is it possible to move on? Please let me know if any other issues. Thanks a lot! > @XiaohongGong Can you please merge with master before I launch testing? Sure, I will do a merge soon. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2739369292 From epeter at openjdk.org Thu Mar 20 06:48:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 06:48:10 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: References: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> Message-ID: On Thu, 20 Mar 2025 01:47:13 GMT, Xiaohong Gong wrote: >> I'm getting this failure with `-XX:UseAVX=1` on x64. It is a new test you added. >> >> Failed IR Rules (1) of Methods (1) >> ---------------------------------- >> 1) Method "public void compiler.vectorapi.VectorSaturatedOperationsTest.susub_masked()" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={BEFORE_MATCHING}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx", "true", "asimd", "true"}, counts={"_#V#SATURATING_SUB_VL#_", " >0 ", "unsigned_vector_node", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "Before matching": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(SaturatingSubV.*)+(\\s){2}===.*vector[A-Za-z])" >> - Failed comparison: [found] 0 > 0 [given] >> - No nodes matched! >> * Constraint 2: "unsigned_vector_node" >> - Failed comparison: [found] 0 > 0 [given] >> - No nodes matched! > > Hi @eme64 , may I know what the test's status is? Seems the failure on linux-x64 is not caused by this PR. Is it possible to move on? Please let me know if any other issues. Thanks a lot! @XiaohongGong Can you please merge with master before I launch testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2739363291 From epeter at openjdk.org Thu Mar 20 07:04:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 07:04:09 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 02:36:44 GMT, Xiaohong Gong wrote: >>> Thanks for looking at this PR again @eme64 ! Vector API has its own jtreg tests under `test/jdk/jdk/incubator/vector/`. I double checked that it has the `rearrange` test for all vector species. Please see one of the test here: https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Long128VectorTests.java#L4954 That's also way I did not add the correct tests in the IR test file. >> >> Alright. I think result verification would still be good practice, and not too difficult to do using a `@Check` method and `Verify.java` for comparing the resulting arrays. But I leave that up to you. In my experience, the VectorAPI test coverage is not as good as I first thought, see the list of bugs I recently found: >> https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC >> >> So adding a little more rigor to your IR test could catch possible bugs that the existing tests simply do not cover. > > Hi @eme64 , the IR test is updated according to your suggestion. Could you please look at it again? Thanks so much! @XiaohongGong Could you please also merge here before I rerun the testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2739397231 From xgong at openjdk.org Thu Mar 20 07:06:31 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 20 Mar 2025 07:06:31 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v4] In-Reply-To: References: Message-ID: > Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures. > > The performance of Vector API relative JMH micro benchmarks can improve about 70x ~ 95x on a NVIDIA Grace CPU, which is a 128-bit vector length sve2 architecture, with different UseSVE options. Here is the gain details: > > > Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 > ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x > ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x > ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x > ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x > ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x > ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x > ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x > ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x > ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x > ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x > ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x > ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x > IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x > IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x > IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x > IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x > IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x > IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x > IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x > IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x > IntMaxVector.UMAX 1024 thrpt 30 82.40x 81.85x 82.242x > IntMaxVector.UMAXMasked 1024 thrpt 30 90.30x 92.10x 92.587x > IntMaxVector.UMIN 1024 thrpt 30 82.84x 81.43x 82.801x > IntMaxVector.UMINMasked 1024 thrpt 30 90.43x 91.49x 92.678x > LongMaxVector.SADD 1024 thrpt 30 82.01x 81.74x 82.153x > LongMaxVector... Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'jdk:master' into JDK_8349522 - Fix IR test failure on X64 with UseAVX=1 - Merge branch 'jdk:master' into JDK_8349522 - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures. The performance of Vector API relative jmh micro benchmarks can improve about 70x ~ 95x on an AArch64 128-bit vector length sve2 architecture with different UseSVE options. Here is the uplift details: ``` Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x IntMaxVector.UMAX 1024 thrpt 30 82.40x 81.85x 82.242x IntMaxVector.UMAXMasked 1024 thrpt 30 90.30x 92.10x 92.587x IntMaxVector.UMIN 1024 thrpt 30 82.84x 81.43x 82.801x IntMaxVector.UMINMasked 1024 thrpt 30 90.43x 91.49x 92.678x LongMaxVector.SADD 1024 thrpt 30 82.01x 81.74x 82.153x LongMaxVector.SADDMasked 1024 thrpt 30 91.61x 92.69x 93.579x LongMaxVector.SSUB 1024 thrpt 30 81.97x 81.42x 82.991x LongMaxVector.SSUBMasked 1024 thrpt 30 91.34x 92.47x 93.026x LongMaxVector.SUADD 1024 thrpt 30 82.44x 81.29x 82.506x LongMaxVector.SUADDMasked 1024 thrpt 30 92.21x 92.35x 93.419x LongMaxVector.SUSUB 1024 thrpt 30 82.04x 80.98x 81.761x LongMaxVector.SUSUBMasked 1024 thrpt 30 91.74x 92.39x 93.375x LongMaxVector.UMAX 1024 thrpt 30 81.59x 80.21x 82.162x LongMaxVector.UMAXMasked 1024 thrpt 30 70.09x 92.89x 93.627x LongMaxVector.UMIN 1024 thrpt 30 82.31x 81.95x 82.298x LongMaxVector.UMINMasked 1024 thrpt 30 69.85x 92.19x 93.390x ShortMaxVector.SADD 1024 thrpt 30 80.08x 79.15x 80.310x ShortMaxVector.SADDMasked 1024 thrpt 30 90.74x 92.00x 93.743x ShortMaxVector.SSUB 1024 thrpt 30 79.54x 78.67x 80.584x ShortMaxVector.SSUBMasked 1024 thrpt 30 91.18x 92.10x 93.725x ShortMaxVector.SUADD 1024 thrpt 30 79.86x 79.37x 80.372x ShortMaxVector.SUADDMasked 1024 thrpt 30 90.17x 92.43x 93.759x ShortMaxVector.SUSUB 1024 thrpt 30 79.78x 79.85x 80.744x ShortMaxVector.SUSUBMasked 1024 thrpt 30 89.99x 91.91x 93.320x ShortMaxVector.UMAX 1024 thrpt 30 79.87x 79.81x 80.518x ShortMaxVector.UMAXMasked 1024 thrpt 30 89.69x 91.70x 92.826x ShortMaxVector.UMIN 1024 thrpt 30 79.11x 77.98x 79.458x ShortMaxVector.UMINMasked 1024 thrpt 30 90.49x 92.86x 93.323x ``` Tested with `hotspot::hotspot_all` and `jdk::jdk_all`, and no new regression is found. [1] https://github.com/openjdk/jdk/pull/20507 ------------- Changes: https://git.openjdk.org/jdk/pull/23608/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23608&range=03 Stats: 1151 lines in 8 files changed: 674 ins; 5 del; 472 mod Patch: https://git.openjdk.org/jdk/pull/23608.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23608/head:pull/23608 PR: https://git.openjdk.org/jdk/pull/23608 From xgong at openjdk.org Thu Mar 20 07:13:43 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 20 Mar 2025 07:13:43 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v6] In-Reply-To: References: Message-ID: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into JDK-8350463 - Use a smaller warmup and array length in IR test - Update IR test based on the review comment - Merge branch 'jdk:master' into JDK-8350463 - Add the IR test - 8350463: AArch64: Add vector rearrange support for small lane count vectors The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. This patch added the rearrange support for vector types with small lane count. Here are the main changes: - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. 1) NEON JMH on panama-vector:vectorIntrinsics: ``` Benchmark (size) Mode Cnt Units Before After Gain Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x ``` JMH on jdk mainline: ``` Benchmark (SIZE) Mode Cnt Units Before After Gain SelectFromBenchmark.rearrangeFromDoubleVector 1024 thrpt 30 ops/ms 44.593 1319.977 29.63x SelectFromBenchmark.rearrangeFromDoubleVector 2048 thrpt 30 ops/ms 22.318 660.061 29.58x SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 30 ops/ms 45.823 1458.144 31.82x SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 30 ops/ms 23.050 729.881 31.67x VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 30 ops/ms 97.210 1082.884 11.14x VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 30 ops/ms 48.642 541.341 11.13x VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 30 ops/ms 24.285 270.419 11.14x VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 30 ops/ms 12.421 135.115 10.88x ``` 2) SVE JMH on panama-vector:vectorIntrinsics: ``` Benchmark (size) Mode Cnt Units Before After Gain Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.396 577.744 7.37x Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.119 2538.261 35.19x Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.992 2536.972 34.75x Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.400 561.934 7.26x Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.858 2949.076 41.61x Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 70.654 2954.273 41.81x Int64Vector.rearrange 1024 thrpt 30 ops/ms 77.851 563.969 7.24x Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 67.433 1510.484 22.39x Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 66.614 1511.617 22.69x Long128Vector.rearrange 1024 thrpt 30 ops/ms 77.637 579.021 7.46x Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 69.886 1274.331 18.23x Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 70.069 1273.787 18.17x ``` JMH on jdk mainline: ``` Benchmark (SIZE) Mode Cnt Units Before After Gain SelectFromBenchmark.rearrangeFromDoubleVector 1024 thrpt 30 ops/ms 44.612 1351.850 30.30x SelectFromBenchmark.rearrangeFromDoubleVector 2048 thrpt 30 ops/ms 22.315 676.314 30.31x SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 30 ops/ms 46.372 1502.036 32.39x SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 30 ops/ms 23.361 749.133 32.07x VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 30 ops/ms 97.780 1759.061 17.99x VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 30 ops/ms 48.923 879.584 17.98x VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 30 ops/ms 24.219 439.588 18.15x VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 30 ops/ms 12.416 219.603 17.69x ``` [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L209 [2] https://bugs.openjdk.org/browse/JDK-8310691 ------------- Changes: https://git.openjdk.org/jdk/pull/23790/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23790&range=05 Stats: 510 lines in 6 files changed: 401 ins; 86 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/23790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23790/head:pull/23790 PR: https://git.openjdk.org/jdk/pull/23790 From rehn at openjdk.org Thu Mar 20 07:23:06 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 20 Mar 2025 07:23:06 GMT Subject: RFR: 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 02:32:18 GMT, Fei Yang wrote: > Hi, please consider this small change. > > `UsePoly1305Intrinsics`, `UseMD5Intrinsics` and `UseSHA1Intrinsics` depend on `!AvoidUnalignedAccesses` and thus are unavailable on platforms with slow unaligned accesses. But these options could still be enabled on the command line, which I think could be suprising to our end users as these intrinsics will only have negative impact on performance numbers for such platforms. It seems to me more reasonable to print warnings and keep them disabled when enabled by the user on such platforms. After this change, we have: > > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseMD5Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for MD5 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UsePoly1305Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for Poly1305 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseSHA1Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for SHA-1 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24123#pullrequestreview-2701536792 From xgong at openjdk.org Thu Mar 20 07:31:09 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 20 Mar 2025 07:31:09 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 02:36:44 GMT, Xiaohong Gong wrote: >>> Thanks for looking at this PR again @eme64 ! Vector API has its own jtreg tests under `test/jdk/jdk/incubator/vector/`. I double checked that it has the `rearrange` test for all vector species. Please see one of the test here: https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Long128VectorTests.java#L4954 That's also way I did not add the correct tests in the IR test file. >> >> Alright. I think result verification would still be good practice, and not too difficult to do using a `@Check` method and `Verify.java` for comparing the resulting arrays. But I leave that up to you. In my experience, the VectorAPI test coverage is not as good as I first thought, see the list of bugs I recently found: >> https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC >> >> So adding a little more rigor to your IR test could catch possible bugs that the existing tests simply do not cover. > > Hi @eme64 , the IR test is updated according to your suggestion. Could you please look at it again? Thanks so much! > @XiaohongGong Could you please also merge here before I rerun the testing? Sure and have rebased. Thanks a lot for your testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2739452172 From xgong at openjdk.org Thu Mar 20 07:31:10 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 20 Mar 2025 07:31:10 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: References: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> Message-ID: On Thu, 20 Mar 2025 06:43:10 GMT, Emanuel Peter wrote: >> Hi @eme64 , may I know what the test's status is? Seems the failure on linux-x64 is not caused by this PR. Is it possible to move on? Please let me know if any other issues. Thanks a lot! > > @XiaohongGong Can you please merge with master before I launch testing? Hi @eme64 I'v rebased this PR. Thanks a lot for your testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2739454391 From chagedorn at openjdk.org Thu Mar 20 07:51:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Mar 2025 07:51:13 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v3] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 09:59:59 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - more > - Merge branch 'master' into JDK-8341976 > - more > - exp > - fix > - Merge branch 'master' into HEAD > - review > - whitespace > - fix & test That looks reasonable. I've launched some testing and results look good so far (there is quite some load at the moment - will take a bit longer to complete than usual). src/hotspot/share/opto/macroArrayCopy.cpp line 826: > 824: } > 825: > 826: if (is_partial_array_copy) { Why is this check no longer required? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23465#pullrequestreview-2701568242 PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2004987093 From rcastanedalo at openjdk.org Thu Mar 20 08:11:09 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 20 Mar 2025 08:11:09 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: <2OJ3-DbdBBiHjZKVDGNXb-IFLqXi0jI4Ezz_zs6NsMA=.a904b56a-a3c4-4f06-a708-70223beccb60@github.com> Message-ID: On Tue, 18 Mar 2025 14:48:22 GMT, Christian Hagedorn wrote: > * We should make sure that compilation speed is not significantly affected by doing this search on all dying `Type` nodes (maybe @robcasloz can give you some pointers here - he did some compilation time measurements before). I measured C2 speed for this patch on top of jdk-25+14 vs jdk-25+14 using DaCapo23 on two different platforms and do not see any significant effect, see detailed results [here](https://github.com/user-attachments/files/19361310/C2-speed-jdk-25%2B14-vs-JDK-8349479.pdf). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2739518268 From duke at openjdk.org Thu Mar 20 08:18:26 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 20 Mar 2025 08:18:26 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v5] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Fix test failure and change for review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/b621db1c..1eba9308 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=03-04 Stats: 29 lines in 2 files changed: 20 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From chagedorn at openjdk.org Thu Mar 20 08:19:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Mar 2025 08:19:15 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 13:27:00 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks Roberto for the evaluation! That looks promising. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2739533560 From duke at openjdk.org Thu Mar 20 08:27:09 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 20 Mar 2025 08:27:09 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:46:01 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > src/hotspot/share/opto/memnode.cpp line 1853: > >> 1851: * +---> Or2 <----+ | >> 1852: * | | >> 1853: * +-----> Or3 <------+ > > The code above has masking, the graph not. Can you add an explanatory comment, please ;) Comment added. > src/hotspot/share/opto/memnode.cpp line 1855: > >> 1853: * +-----> Or3 <------+ >> 1854: * >> 1855: * It will be transformed as a merged LoadI and replace the Or3 node > > Suggestion: > > * It is transformed as a merged LoadI, which replaces the Or3 node. Changed. > src/hotspot/share/opto/memnode.cpp line 1976: > >> 1974: // Go through ConvI2L which is unique output of the load >> 1975: Node* MergePrimitiveLoads::by_pass_i2l(const LoadNode* l) { >> 1976: if ( l != nullptr && l->outcnt() == 1 && l->unique_out()->Opcode() == Op_ConvI2L) { > > Suggestion: > > if (l != nullptr && l->outcnt() == 1 && l->unique_out()->Opcode() == Op_ConvI2L) { Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2005041066 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2005041520 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2005040029 From amitkumar at openjdk.org Thu Mar 20 09:38:10 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 20 Mar 2025 09:38:10 GMT Subject: RFR: 8341908: CodeHeapAnalytics: Output Imperfections and unwanted vm termination [v3] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 21:06:48 GMT, Lutz Schmidt wrote: >> Output is properly aligned again now. Was messed up when method hotness was removed (part of method sweeper). >> Assertions have been replaced by printing an error message and gracefully returning. Avoids vm crashes caused by diagnostic actions. >> Some code restructuring, removal of redundancies. >> >> Reviews are highly welcomed. > > Lutz Schmidt has updated the pull request incrementally with one additional commit since the last revision: > > 8341908: fix make error One thing I found a bit salty, is how we are printing blobType information. Before: -------------------------------------------------------------------- Address range [0x000003ff883cf000,0x000003ff8844f000), 512k -------------------------------------------------------------------- compiler method Addr(module) offset size type lvl blobType Name 0x000003ff883cf008 (+0x00000008) buffer blob flush_icache_stub 0x000003ff883cf408 (+0x00000408) runtime stub Shared Runtime wrong_method_blob 0x000003ff883cf808 (+0x00000808) buffer blob StubRoutines (initialstubs) 0x000003ff883d4a08 (+0x00005a08) runtime stub Shared Runtime throw_StackOverflowError_blob 0x000003ff883d4e08 (+0x00005e08) buffer blob StubRoutines (continuationstubs) 0x000003ff883d5908 (+0x00006908) buffer blob Interpreter 0x000003ff8843c408 (+0x0006d408) adapter blob I2C/C2I adapters 0x000003ff8843c808 (+0x0006d808) adapter blob I2C/C2I adapters 0x000003ff8843cc08 (+0x0006dc08) adapter blob I2C/C2I adapters 0x000003ff8843d008 (+0x0006e008) adapter blob I2C/C2I adapters 0x000003ff8843d408 (+0x0006e408) adapter blob I2C/C2I adapters With current patch: -------------------------------------------------------------------- Address range [0x000003ff7c3cf000,0x000003ff7c44f000), 512k -------------------------------------------------------------------- blob compiler method Addr(module) offset size type type lvl Name 0x000003ff7c3cf008 (+0x00000008) 0x00000078( 0K) A flush_icache_stub 0x000003ff7c3cf408 (+0x00000408) 0x00000280( 0K) Z Shared Runtime wrong_method_blob 0x000003ff7c3cf808 (+0x00000808) 0x00005118( 20K) A StubRoutines (initialstubs) 0x000003ff7c3d4a08 (+0x00005a08) 0x000001e8( 0K) Z Shared Runtime throw_StackOverflowError_blob 0x000003ff7c3d4e08 (+0x00005e08) 0x00000ac8( 2K) A StubRoutines (continuationstubs) 0x000003ff7c3d5908 (+0x00006908) 0x00066a88( 410K) A Interpreter 0x000003ff7c43c408 (+0x0006d408) 0x00000218( 0K) E I2C/C2I adapters 0x000003ff7c43cb08 (+0x0006db08) 0x000001d0( 0K) E I2C/C2I adapters 0x000003ff7c43d008 (+0x0006e008) 0x000001d8( 0K) E I2C/C2I adapters 0x000003ff7c43d408 (+0x0006e408) 0x000001d8( 0K) E I2C/C2I adapters 0x000003ff7c43d808 (+0x0006e808) 0x00000238( 0K) E I2C/C2I adapters 0x000003ff7c43dc08 (+0x0006ec08) 0x00000228( 0K) E I2C/C2I adapters 0x000003ff7c43e008 (+0x0006f008) 0x00000230( 0K) E I2C/C2I adapters And then we will refer the `typeTable` to understand the Type information: +---------------------------------------------------+ | Block types used in the following CodeHeap dump | +---------------------------------------------------+ - noType C - nMethod (under construction), cannot be observed N - nMethod (active) I - nMethod (inactive) X - nMethod (deopt) Z - runtime stub U - ricochet stub R - deopt stub ? - uncommon trap stub D - exception stub T - safepoint stub E - adapter blob S - MH adapter blob A - buffer blob ----------------------------------------------------- This seemed a bit overdoing, but I am fine with it. src/hotspot/share/code/codeHeapState.cpp line 594: > 592: // This is necessary to prevent an unsigned short overflow while accumulating space information. > 593: // > 594: if (!(granularity > 0)) { Suggestion: if (granularity < 0) { ------------- PR Review: https://git.openjdk.org/jdk/pull/21452#pullrequestreview-2701705565 PR Review Comment: https://git.openjdk.org/jdk/pull/21452#discussion_r2005064696 From epeter at openjdk.org Thu Mar 20 09:58:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 09:58:11 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: <-L8Nx1MUEYQQe3WABsGddXYoM6G8Ov1Hl8FvlgOa0zI=.a6901ad0-e364-471b-9d88-03ce7c7a4f22@github.com> On Fri, 7 Feb 2025 13:27:00 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/opto/node.cpp line 3110: > 3108: > 3109: Node* TypeNode::Ideal(PhaseGVN* phase, bool can_reshape) { > 3110: if (can_reshape && Value(phase) == Type::TOP) { Why not use `phase->type(this)`? test/hotspot/jtreg/compiler/c2/TestGuardOfCastIIDoesntFold.java line 30: > 28: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseOnStackReplacement > 29: * -XX:CompileCommand=dontinline,TestGuardOfCastIIDoesntFold::notInlined > 30: * TestGuardOfCastIIDoesntFold Nit: can we have a run without flags, please ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2005101901 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2005044876 From dnsimon at openjdk.org Thu Mar 20 10:27:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 20 Mar 2025 10:27:52 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer Message-ID: This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. ------------- Commit messages: - short-circuit verify_section_allocation for partially initialized buffer Changes: https://git.openjdk.org/jdk/pull/24118/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24118&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352420 Stats: 3 lines in 2 files changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24118.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24118/head:pull/24118 PR: https://git.openjdk.org/jdk/pull/24118 From mbaesken at openjdk.org Thu Mar 20 10:27:52 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 20 Mar 2025 10:27:52 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 15:43:54 GMT, Doug Simon wrote: > This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. Yeah we had a couple of those shift issues see https://github.com/openjdk/jdk/pull/24118#issuecomment-2739695432 Not sure why I do not see those, have to check. > Yeah we had a couple of those shift issues see [#24118 (comment)](https://github.com/openjdk/jdk/pull/24118#issuecomment-2739695432) Not sure why I do not see those, have to check. I found why it builds on my side - I set the additional configure parameter for macOS aarch64 `--enable-ubsan --with-additional-ubsan-checks=-fno-sanitize=shift-exponent` With your patch applied I do not see the failure any more, compiler/jvmci/errors/TestInvalidCompilationResult runs successfully. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24118#issuecomment-2739724304 PR Comment: https://git.openjdk.org/jdk/pull/24118#issuecomment-2739835042 PR Comment: https://git.openjdk.org/jdk/pull/24118#issuecomment-2739851580 From dnsimon at openjdk.org Thu Mar 20 10:27:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 20 Mar 2025 10:27:52 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer In-Reply-To: References: Message-ID: <1oae-YJw5URq4bn0OLRAWhs2WhdMWATHEUW5LFw5wrM=.203f448c-817a-4930-8230-8e408f902f75@github.com> On Wed, 19 Mar 2025 15:43:54 GMT, Doug Simon wrote: > This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. @MBaesken can you please test this change on a ubsan enabled build or remind me how I can do that myself. Unfortunately, looks like there's another ubsan issue blocking my way: ~/d/jdk-jdk/open (master)> make CONF_NAME=macosx-aarch64 LOG=info TEST=compiler/jvmci/errors/TestInvalidCompilationResult.java test Building configuration 'macosx-aarch64' (matching CONF_NAME=macosx-aarch64) Generating main target list Building configuration 'macosx-aarch64' (matching CONF_NAME=macosx-aarch64) Running make as '/Applications/Xcode.app/Contents/Developer/usr/bin/make CONF_NAME=macosx-aarch64 LOG=info TEST=compiler/jvmci/errors/TestInvalidCompilationResult.java test' Building target 'test' in configuration 'macosx-aarch64' Building JVM variant 'server' with features 'cds compiler1 compiler2 dtrace epsilongc g1gc jfr jni-check jvmci jvmti management parallelgc serialgc services vm-structs zgc' ad_aarch64.hpp:7096:11: runtime error: shift exponent 100 is too large for 32-bit type 'uint' (aka 'unsigned int') #0 0x105728714 in Pipeline_Use_Element::step(unsigned int) ad_aarch64.hpp:7150 #1 0x105721bf8 in Pipeline_Use::step(unsigned int) ad_aarch64.hpp:7198 #2 0x105724630 in Scheduling::AddNodeToBundle(Node*, Block const*) output.cpp:2553 #3 0x105722b40 in Scheduling::DoScheduling() output.cpp:2816 #4 0x10571745c in PhaseOutput::ScheduleAndBundle() output.cpp:2167 #5 0x1057147c8 in PhaseOutput::Output() output.cpp:341 #6 0x104b6dff8 in Compile::Code_Gen() compile.cpp:3082 #7 0x104b6af48 in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) compile.cpp:893 #8 0x104b6e0e0 in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) compile.cpp:695 #9 0x1049f58c8 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) c2compiler.cpp:141 #10 0x104b928ec in CompileBroker::invoke_compiler_on_method(CompileTask*) compileBroker.cpp:2331 #11 0x104b91aec in CompileBroker::compiler_thread_loop() compileBroker.cpp:1975 #12 0x10519809c in JavaThread::thread_main_inner() javaThread.cpp:776 #13 0x105197d50 in JavaThread::run() javaThread.cpp:761 #14 0x1059e1b28 in Thread::call_run() thread.cpp:231 #15 0x1056ff3bc in thread_native_entry(Thread*) os_bsd.cpp:601 #16 0x1810602e0 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x72e0) (BuildId: 642faf7a874e37e68aba2b0cc09a302532000000200000000100000000030f00) #17 0x18105b0f8 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x20f8) (BuildId: 642faf7a874e37e68aba2b0cc09a302532000000200000000100000000030f00) This build is using Xcode 14.3.1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24118#issuecomment-2737144179 PR Comment: https://git.openjdk.org/jdk/pull/24118#issuecomment-2739695432 From mbaesken at openjdk.org Thu Mar 20 10:27:52 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 20 Mar 2025 10:27:52 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer In-Reply-To: <1oae-YJw5URq4bn0OLRAWhs2WhdMWATHEUW5LFw5wrM=.203f448c-817a-4930-8230-8e408f902f75@github.com> References: <1oae-YJw5URq4bn0OLRAWhs2WhdMWATHEUW5LFw5wrM=.203f448c-817a-4930-8230-8e408f902f75@github.com> Message-ID: On Wed, 19 Mar 2025 15:47:42 GMT, Doug Simon wrote: > or remind me how I can do that myself. Hi @dougxc you have to use the --enable-ubsan configure option; and probably you need XCode 15 on macOS aarch64 (with XCode 13 we had problems using ubsan). But I can also later test your change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24118#issuecomment-2739593454 From bulasevich at openjdk.org Thu Mar 20 10:49:09 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 20 Mar 2025 10:49:09 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 17:18:50 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/code/codeBlob.cpp line 156: >> >>> 154: } else { >>> 155: // We need unique and valid not null address >>> 156: _mutable_data = blob_end(); >> >> It makes me a little nervous pointing this value to real data. When RelocIterator computes `_current = nm->relocation_begin() - 1`, it should never read or write from that address, but how can we guarantee that? Any non-null address that is guarateed unmapped would do, or a special protetected page like `bad_page` here: https://github.com/openjdk/jdk/blob/8e530633a9d99d7ce585cafd5573cb89212feee7/src/hotspot/share/runtime/safepointMechanism.cpp#L66. If using protected memory seems like overkill, then I suggest using a static. Something like this: >> >> static union { >> relocInfo _dummy[1]; >> } _empty[2]; >> [...] >> _mutable_data = _empty+1; >> >> However, I think this is not the first time we have run into this issue with RelocIterator. Maybe it's time that we rewrote it to avoid this situation? > > How about this?: > > +++ b/src/hotspot/share/code/relocInfo.cpp > @@ -117,6 +117,8 @@ void relocInfo::change_reloc_info_for_address(RelocIterator *itr, address pc, re > // Implementation of RelocIterator > > +static relocInfo dummy_reloc[2]; > + > void RelocIterator::initialize(nmethod* nm, address begin, address limit) { > initialize_misc(); > > @@ -127,8 +129,14 @@ void RelocIterator::initialize(nmethod* nm, address begin, address limit) { > guarantee(nm != nullptr, "must be able to deduce nmethod from other arguments"); > > _code = nm; > - _current = nm->relocation_begin() - 1; > - _end = nm->relocation_end(); > + // Check for no relocations case and use dummy data to avoid referencing wrong data. > + if (nm->relocation_size() == 0) { > + _current = dummy_reloc; > + _end = dummy_reloc + 1; > + } else { > + _current = nm->relocation_begin() - 1; > + _end = nm->relocation_end(); > + } > _addr = nm->content_begin(); > > // Initialize code sections. > > > I filed RFE: [JDK-8352426](https://bugs.openjdk.org/browse/JDK-8352426) We can just add nullptr checks before pointer arithmetic in relocInfo: diff --git a/src/hotspot/share/code/relocInfo.cpp b/src/hotspot/share/code/relocInfo.cpp index 7aae32759dd..c694f21e5ca 100644 --- a/src/hotspot/share/code/relocInfo.cpp +++ b/src/hotspot/share/code/relocInfo.cpp @@ -127,7 +127,8 @@ void RelocIterator::initialize(nmethod* nm, address begin, address limit) { guarantee(nm != nullptr, "must be able to deduce nmethod from other arguments"); _code = nm; - _current = nm->relocation_begin() - 1; + _current = nm->relocation_begin(); + if (_current != nullptr) { _current--; } _end = nm->relocation_end(); _addr = nm->content_begin(); diff --git a/src/hotspot/share/code/relocInfo.hpp b/src/hotspot/share/code/relocInfo.hpp index 25cca49e50b..b440e713493 100644 --- a/src/hotspot/share/code/relocInfo.hpp +++ b/src/hotspot/share/code/relocInfo.hpp @@ -603,7 +603,7 @@ class RelocIterator : public StackObj { // get next reloc info, return !eos bool next() { - _current++; + if (_current != nullptr) { _current++; } assert(_current <= _end, "must not overrun relocInfo"); if (_current == _end) { set_has_current(false); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24102#discussion_r2005301702 From thartmann at openjdk.org Thu Mar 20 12:25:12 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 20 Mar 2025 12:25:12 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v6] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 13:26:51 GMT, Marc Chevalier wrote: >> Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! >> >> Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might >> be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. >> >> To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. >> >> The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: >> >> "precise .*" + IS_REPLACED + ":" >> >> which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: >> >> "allocationKlass:.*\\b" + IS_REPLACED + "\\s" >> >> which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. It can also not be extended on the right into `MyClassButNotReally` because of the space character that must exist after the searched string. >> >> The case of array allocations is slightly more tricky, but essentially similar. >> >> It is not quite fool-proof since a package path can still be extended, e.g. >> >> @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) >> >> will also match allocations of `a/prefix/some/package/MyClass`. >> >> I think it's an acceptable limitation. >> >> Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) !jvms: Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) >> >> and now it is >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/l... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright year Nice, through description and comments. The changes look good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24093#pullrequestreview-2702504733 From epeter at openjdk.org Thu Mar 20 12:26:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 12:26:09 GMT Subject: RFR: 8352131: [REDO] C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> References: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> Message-ID: <-YNJEmyPwjK8Qi_9RkrZlcl3zTzRmzOmLiJvpzREDpI=.cb5d422c-1958-4021-9023-c99e7428dfee@github.com> On Wed, 19 Mar 2025 15:08:56 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > The original fix with https://github.com/openjdk/jdk/pull/24031 missed the following: We release the memory for the directives here: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2343 > > and then wrongly accessed the memory again to fetch `PrintCompilationOption` on this line: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2377 > > This worked most of the time because the memory was not overridden, yet, but of course is a wrong use after free case. We only noticed this in some intermittent test failures where we suddenly dumped `COMPILE SKIPPED` where it was unexpected for tests. > > I now moved the memory release down after we access it for `PrintCompilationOption`. > > Thanks, > Christian Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24117#pullrequestreview-2702507759 From thartmann at openjdk.org Thu Mar 20 12:29:22 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 20 Mar 2025 12:29:22 GMT Subject: RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code In-Reply-To: <_3l8ylsbgvsqQE1Ihp0BUAx2o_VzcS6R2jWBSKW9u1E=.0dcb6086-ff6f-4c9a-b990-6665a476a3dc@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> <_3l8ylsbgvsqQE1Ihp0BUAx2o_VzcS6R2jWBSKW9u1E=.0dcb6086-ff6f-4c9a-b990-6665a476a3dc@github.com> Message-ID: On Fri, 7 Mar 2025 18:03:14 GMT, Vladimir Ivanov wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > src/hotspot/share/opto/library_call.cpp line 1963: > >> 1961: set_i_o(i_o()); >> 1962: >> 1963: uncommon_trap(Deoptimization::Reason_intrinsic, > > What about using `builtin_throw` here? (Requires some tuning on `builtin_throw` side.) How much does it affect performance? Also, passing `must_throw = true` into `uncommon_trap` may help a bit here as well. I think adapting and re-using `builtin_throw` like you described is reasonable but I let @iwanowww confirm :slightly_smiling_face: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2005526386 From chagedorn at openjdk.org Thu Mar 20 12:35:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Mar 2025 12:35:23 GMT Subject: RFR: 8352131: [REDO] C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> References: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> Message-ID: On Wed, 19 Mar 2025 15:08:56 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > The original fix with https://github.com/openjdk/jdk/pull/24031 missed the following: We release the memory for the directives here: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2343 > > and then wrongly accessed the memory again to fetch `PrintCompilationOption` on this line: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2377 > > This worked most of the time because the memory was not overridden, yet, but of course is a wrong use after free case. We only noticed this in some intermittent test failures where we suddenly dumped `COMPILE SKIPPED` where it was unexpected for tests. > > I now moved the memory release down after we access it for `PrintCompilationOption`. > > Thanks, > Christian Thanks Emanuel for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24117#issuecomment-2740305151 From chagedorn at openjdk.org Thu Mar 20 12:35:24 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Mar 2025 12:35:24 GMT Subject: Integrated: 8352131: [REDO] C2: Print compilation bailouts with PrintCompilation compile command In-Reply-To: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> References: <_1_FOdNfmiNZYLckVHCJsnWjXnDgPoSxrMqwc9FmJvc=.b66f1b60-e3a6-4e3c-833f-1f7f2d65eaee@github.com> Message-ID: On Wed, 19 Mar 2025 15:08:56 GMT, Christian Hagedorn wrote: > We currently only print a compilation bailout with `-XX:+PrintCompilation`: > > 7782 90 b 4 Test::main (19 bytes) > 7792 90 b 4 Test::main (19 bytes) COMPILE SKIPPED: StressBailout > > But not when using `-XX:CompileCommand=printcompilation,*::*`. This patch enables this. > > The original fix with https://github.com/openjdk/jdk/pull/24031 missed the following: We release the memory for the directives here: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2343 > > and then wrongly accessed the memory again to fetch `PrintCompilationOption` on this line: > https://github.com/openjdk/jdk/blob/fed34e46b89bc9b0462d9b5f5e5ab5516fe18c6e/src/hotspot/share/compiler/compileBroker.cpp#L2377 > > This worked most of the time because the memory was not overridden, yet, but of course is a wrong use after free case. We only noticed this in some intermittent test failures where we suddenly dumped `COMPILE SKIPPED` where it was unexpected for tests. > > I now moved the memory release down after we access it for `PrintCompilationOption`. > > Thanks, > Christian This pull request has now been integrated. Changeset: 2560a637 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/2560a63773ad8223e42d3ecf5bdcaaec30b001ee Stats: 4 lines in 1 file changed: 2 ins; 1 del; 1 mod 8352131: [REDO] C2: Print compilation bailouts with PrintCompilation compile command Reviewed-by: thartmann, kvn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24117 From epeter at openjdk.org Thu Mar 20 12:38:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 12:38:16 GMT Subject: RFR: 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 13:57:33 GMT, Marc Chevalier wrote: >> In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: >> https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 >> >> Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. >> >> About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. >> >> And thanks to @eme64 who extracted such a concise reproducer. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > various fixes @marc-chevalier Looks good to me, except for missing punctuation ;) src/hotspot/share/opto/compile.hpp line 1233: > 1231: // Graph verification code > 1232: // Walk the node list, verifying that there is a one-to-one correspondence > 1233: // between Use-Def edges and Def-Use edges The option no_dead_code enables Suggestion: // between Use-Def edges and Def-Use edges. The option no_dead_code enables ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23977#pullrequestreview-2702539899 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r2005539542 From duke at openjdk.org Thu Mar 20 12:40:32 2025 From: duke at openjdk.org (Anjian-Wen) Date: Thu, 20 Mar 2025 12:40:32 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction Message-ID: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> support Zvbb Vector And-Not draft ------------- Commit messages: - RISC-V: C2: Support Zvbb Vector And-Not instruction Changes: https://git.openjdk.org/jdk/pull/24129/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329887 Stats: 133 lines in 2 files changed: 133 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24129/head:pull/24129 PR: https://git.openjdk.org/jdk/pull/24129 From chagedorn at openjdk.org Thu Mar 20 12:49:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Mar 2025 12:49:15 GMT Subject: RFR: 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal [v2] In-Reply-To: References: Message-ID: <6kmH4d0GY7waWekPmv3agR7Hy4Wh7hcNE-DwP8iUfQU=.9aed0941-2a27-4b6c-af5e-a363c2525a48@github.com> On Tue, 18 Mar 2025 13:57:33 GMT, Marc Chevalier wrote: >> In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: >> https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 >> >> Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. >> >> About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. >> >> And thanks to @eme64 who extracted such a concise reproducer. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > various fixes Some minor comments. Otherwise, looks good to me, too! src/hotspot/share/opto/compile.cpp line 4208: > 4206: if (root_and_safepoints != nullptr) { > 4207: assert(root_and_safepoints->member(_root), "root is not in root_and_safepoints"); > 4208: for (unsigned int i = 0, limit = root_and_safepoints->size(); i < limit; i++) { You should use `uint` instead: Suggestion: for (uint i = 0, limit = root_and_safepoints->size(); i < limit; i++) { src/hotspot/share/opto/compile.cpp line 4210: > 4208: for (unsigned int i = 0, limit = root_and_safepoints->size(); i < limit; i++) { > 4209: Node* root_or_safepoint = root_and_safepoints->at(i); > 4210: // If the node is a safepoint, let's check it still has a control input Suggestion: // If the node is a safepoint, let's check if it still has a control input. src/hotspot/share/opto/compile.cpp line 4211: > 4209: Node* root_or_safepoint = root_and_safepoints->at(i); > 4210: // If the node is a safepoint, let's check it still has a control input > 4211: // Lack of control input signified that this node was killed by CCP or Suggestion: // Lack of control input signifies that this node was killed by CCP or src/hotspot/share/opto/compile.hpp line 1242: > 1240: // > 1241: // To call this function, there are 2 ways to go: > 1242: // - give root_and_safepoints to start traversal everywhere needed (like after CCP), Suggestion: // - give root_and_safepoints to start traversal everywhere needed (like after CCP) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23977#pullrequestreview-2702571571 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r2005557198 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r2005558259 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r2005558468 PR Review Comment: https://git.openjdk.org/jdk/pull/23977#discussion_r2005560678 From thartmann at openjdk.org Thu Mar 20 13:09:18 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 20 Mar 2025 13:09:18 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v5] In-Reply-To: References: Message-ID: <4gi1QLJRikwQR2ShA9zy_cOK4NDsjrJK4ZyyuzuNLjc=.924da387-362c-4a39-b4da-0d347c72d354@github.com> On Mon, 10 Mar 2025 09:49:26 GMT, Damon Fenacci wrote: >> # Issue >> >> The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. >> >> # Cause >> >> The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. >> The graph that leads to the issue looks like this: >> ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) >> The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: >> ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) >> The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. >> The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. >> >> This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. >> >> # Solution >> >> Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. >> >> # Testing >> >> Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) >> >> Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - JDK-8302459: refactor helper method > - JDK-8302459: reshape infinite loop check This was a tricky one to narrow down, good job Damon! :slightly_smiling_face: I added a few code style comments, looks good otherwise. src/hotspot/share/opto/callnode.cpp line 1114: > 1112: } else { > 1113: assert(IncrementalInline, "required"); > 1114: assert(cg->method()->is_method_handle_intrinsic() == false, "required"); Suggestion: assert(!cg->method()->is_method_handle_intrinsic(), "required"); src/hotspot/share/opto/callnode.cpp line 1117: > 1115: if (phase->C->print_inlining()) { > 1116: phase->C->inline_printer()->record(cg->method(), cg->call_node()->jvms(), InliningResult::FAILURE, > 1117: "static call node changed: trying again"); FTR, could you share how the PrintInlining output looks now when this code is triggered? src/hotspot/share/opto/callnode.cpp line 1215: > 1213: bool call_does_dispatch; > 1214: ciMethod* callee = phase->C->optimize_virtual_call(caller, klass, holder, orig_callee, receiver_type, true /*is_virtual*/, > 1215: call_does_dispatch, not_used3); // out-parameters Suggestion: call_does_dispatch, not_used3); // out-parameters src/hotspot/share/opto/compile.cpp line 2050: > 2048: assert(is_scheduled_for_igvn_before == is_scheduled_for_igvn_after, "call node removed from IGVN list during inlining pass"); > 2049: cg->call_node()->set_generator(cg); > 2050: } I find this a bit hard to read. Wouldn't it be semantically equivalent to this? if (is_scheduled_for_igvn_before == is_scheduled_for_igvn_after) { cg->call_node()->set_generator(cg); } else { assert(false, "Some useful message"); } We wouldn't have separate asserts for the two cases, but I think that's fine since one can easily figure it out from the boolean values. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21682#pullrequestreview-2702546897 PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r2005543474 PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r2005569420 PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r2005570672 PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r2005598996 From epeter at openjdk.org Thu Mar 20 13:29:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 13:29:14 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v15] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 10:58:14 GMT, Marc Chevalier wrote: >> This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. >> >> Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. >> >> This also works for multiplications by powers of 2 since they are already translated into shifts. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > more checks src/hotspot/share/opto/memnode.cpp line 3520: > 3518: // Check for useless sign-extension before a partial-word store > 3519: // (StoreB ... (RShiftI _ (LShiftI _ v conIL) conIR)) > 3520: // If (conIL == conIR && conIR <= num_bits) this simplifies to Suggestion: // If (conIL == conIR && conIR <= num_rejected_bits) this simplifies to Since we renamed the argument ;) src/hotspot/share/opto/memnode.cpp line 3549: > 3547: // - conIL >= conIR > 3548: // - num_rejected_bits >= conIR > 3549: // Remembering that only the 8 lower bits have to be correct. To me, this is the core of the comments here, the Statement that we want to prove. so we could highlight it. Maybe like this? Suggestion: // // Statement (proved further below in case analysis): // Given: // - 0 <= conIL < BitsPerJavaInteger (no wrap in shift) // - 0 <= conIR < BitsPerJavaInteger (no wrap in shift) // - conIL >= conIR // - num_rejected_bits >= conIR // Then this form: // (RShiftI _ (LShiftI _ v conIL) conIR) // can be replaced with this form: // (LShiftI _ v (conIL-conIR)) // // Note: We only have to show that the non-rejected lowest bits (8 bits for byte) have to be correct, // as the higher bits are rejected / truncated by the store. src/hotspot/share/opto/memnode.cpp line 3563: > 3561: // ###### Case 1.1: conIL == conIR == num_rejected_bits > 3562: // If we do the shift left then right by 24 bits, we get: > 3563: // after << 24 Suggestion: // after "<< 24" This could help visually, I had to stare at it for 3 sec until I knew it was not a shifting of `after`, i.e. `after << 24` src/hotspot/share/opto/memnode.cpp line 3606: > 3604: // The non-rejected bits are the 8 lower bits of v. The bits 8 and 9 of v are still > 3605: // present in (v << 22) >> 22 but will be dropped by the store. The simplification is > 3606: // still correct. I think you copied this from case `1.2` ? This case is NOT ok to simplify, that's why we needed that condition in the "Statement", right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r2005548215 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r2005568763 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r2005573848 PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r2005585291 From epeter at openjdk.org Thu Mar 20 13:29:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 13:29:15 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v15] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 12:52:35 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> more checks > > src/hotspot/share/opto/memnode.cpp line 3563: > >> 3561: // ###### Case 1.1: conIL == conIR == num_rejected_bits >> 3562: // If we do the shift left then right by 24 bits, we get: >> 3563: // after << 24 > > Suggestion: > > // after "<< 24" > > This could help visually, I had to stare at it for 3 sec until I knew it was not a shifting of `after`, i.e. `after << 24` Alternatively something like `after: << 24` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r2005574740 From epeter at openjdk.org Thu Mar 20 13:29:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 13:29:15 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v15] In-Reply-To: References: Message-ID: <38w_zuEkTVxDZMuzrezzPlHKInlkyTGuoVDHxRHy2Uo=.02729d92-f8db-49aa-aa0a-c6bb268fb7af@github.com> On Thu, 20 Mar 2025 12:53:05 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/memnode.cpp line 3563: >> >>> 3561: // ###### Case 1.1: conIL == conIR == num_rejected_bits >>> 3562: // If we do the shift left then right by 24 bits, we get: >>> 3563: // after << 24 >> >> Suggestion: >> >> // after "<< 24" >> >> This could help visually, I had to stare at it for 3 sec until I knew it was not a shifting of `after`, i.e. `after << 24` > > Alternatively something like `after: << 24` Or even: `after: v << 24` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23728#discussion_r2005576711 From epeter at openjdk.org Thu Mar 20 13:40:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 13:40:14 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v3] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 12:34:39 GMT, Roberto Casta?eda Lozano wrote: >> The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: >> >> >> for (int i = 0; i < array.length; i++) { >> array[i] = 0; >> } >> >> with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: >> >> >> arrayof_jint_fill(array, 0, array.length) >> >> >> The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). >> >> This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations. >> >> Assuming mismatched stores are discarded (as proposed here), an alternative solution would be to define a StoreS node returning the appropriate `memory_type()`. This could be desirable even as a complement to this fix, to prevent similar bugs in the future. I propose to investigate the introduction of a StoreS node in a separate RFE, because it is a much larger and more intrusive changeset, and go with this minimal, local, and non-intrusive fix for backportability. >> >> **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly disable optimization for mismatching stores; add positive and negative tests @robcasloz Nice catch, I'm glad you dug this up and found a reproducer :partying_face: Yes, taking the element type from the address is the best, that way you actually depend on the array, not the type of the store. test/hotspot/jtreg/compiler/loopopts/TestArrayFillAntiDependence.java line 34: > 32: * scheduled correctly, for different load and array fill types. > 33: * See detailed comments in testShort() below. > 34: * @requires vm.compiler2.enabled Is the test so expensive that C2 is required? Or can you just put `-XX:+IgnoreUnrecognizedVMOptions` in the run that has C2 flags? test/hotspot/jtreg/compiler/loopopts/TestArrayFillIntrinsic.java line 46: > 44: // compiler might decide to unroll the array-filling loop instead of > 45: // replacing it with an intrinsic call even if OptimizeFill is enabled. > 46: TestFramework.runWithFlags("-XX:LoopUnrollLimit=0", "-XX:+OptimizeFill"); What about a run without flags just in case? test/hotspot/jtreg/compiler/loopopts/TestArrayFillIntrinsic.java line 46: > 44: // compiler might decide to unroll the array-filling loop instead of > 45: // replacing it with an intrinsic call even if OptimizeFill is enabled. > 46: TestFramework.runWithFlags("-XX:LoopUnrollLimit=0", "-XX:+OptimizeFill"); What about a run without flags just in case? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24005#pullrequestreview-2702771147 PR Review Comment: https://git.openjdk.org/jdk/pull/24005#discussion_r2005661671 PR Review Comment: https://git.openjdk.org/jdk/pull/24005#discussion_r2005664073 PR Review Comment: https://git.openjdk.org/jdk/pull/24005#discussion_r2005664394 From eastigeevich at openjdk.org Thu Mar 20 13:46:06 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 20 Mar 2025 13:46:06 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache [v2] In-Reply-To: References: Message-ID: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Separate active methods and method calling them with 128Mb dummy space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23831/files - new: https://git.openjdk.org/jdk/pull/23831/files/ddcd0341..ef7d9898 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23831&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23831&range=00-01 Stats: 55 lines in 1 file changed: 48 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23831.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23831/head:pull/23831 PR: https://git.openjdk.org/jdk/pull/23831 From epeter at openjdk.org Thu Mar 20 13:54:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 13:54:07 GMT Subject: RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code In-Reply-To: <_3l8ylsbgvsqQE1Ihp0BUAx2o_VzcS6R2jWBSKW9u1E=.0dcb6086-ff6f-4c9a-b990-6665a476a3dc@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> <_3l8ylsbgvsqQE1Ihp0BUAx2o_VzcS6R2jWBSKW9u1E=.0dcb6086-ff6f-4c9a-b990-6665a476a3dc@github.com> Message-ID: <5rSvBeQxKuX-hhaLGygKRBi_VpALqwywgnKfK61a8j4=.258cf9ca-56fe-42a9-85b1-b6aa30f2eb5c@github.com> On Fri, 7 Mar 2025 18:03:53 GMT, Vladimir Ivanov wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Nice benchmark, Marc! @iwanowww Are you still reviewing or should I have a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2740528216 From epeter at openjdk.org Thu Mar 20 13:55:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Mar 2025 13:55:14 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v8] In-Reply-To: References: <6l8orDGDTI-ADWxEmDjMPX1uorIhxLd3T55s0eIzJ3I=.0cb9d2c8-4302-408f-b64e-dc9a8e3d4145@github.com> Message-ID: <2uVU0x7eKKHHDLTEAftG-e9qUSMG95Z968T3uyMsKDc=.f24889f5-098b-4b7b-9943-a794564b69f0@github.com> On Fri, 7 Feb 2025 14:40:24 GMT, Daniel Lund?n wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix trailing whitespace > > Keep alive @dlunde What's the state with this one? Are you looking for reviews? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2740534688 From chagedorn at openjdk.org Thu Mar 20 14:32:38 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Mar 2025 14:32:38 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v6] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 16:18:33 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java > > Co-authored-by: Christian Hagedorn There was a test failure in a bigger test that I cannot share. But I was able to extract a simple reproducer: public class Test { public static void main(String[] args) { test(); } static void test() { Integer.parseInt("1"); } } Run with (might need to run multiple times or increase `RepeatCompilation` count since it is dependent on the seed): java -XX:RepeatCompilation=300 -XX:+StressIGVN -XX:+StressCCP -Xcomp -XX:CompileOnly=*Integer::parseInt Test.java Output: 304 ConI === 0 [[ 506 ]] #int:255 996 CastII === 461 453 [[ 557 546 535 524 1034 506 ]] #int:-256..127 extra types: {0:int:-256} strong dependency !orig=[478] !jvms: Integer::parseInt @ bci:144 (line 550) 506 AndI === _ 996 304 [[ 507 ]] !jvms: Integer::parseInt @ bci:170 (line 552) told = int:0 tnew = top # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/opt/mach5/mesos/work_dir/slaves/2a0767be-5c1b-4719-9b4f-f71b11137965-S769/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/1111ed35-4920-43a2-b273-caace0afc18d/runs/09365b8c-550a-453d-9bbb-ff3c7dc04263/workspace/open/src/hotspot/share/opto/phaseX.cpp:1790), pid=196209, tid=196227 # fatal error: Not monotonic # # JRE version: Java(TM) SE Runtime Environment (25.0) (fastdebug build 25-internal-LTS-2025-03-19-2013139.christian.hagedorn.jdk-test) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 25-internal-LTS-2025-03-19-2013139.christian.hagedorn.jdk-test, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x16a26c9] PhaseCCP::verify_type(Node*, Type const*, Type const*)+0x169 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2740654541 From duke at openjdk.org Thu Mar 20 14:33:55 2025 From: duke at openjdk.org (Marc Chevalier) Date: Thu, 20 Mar 2025 14:33:55 GMT Subject: RFR: 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal [v3] In-Reply-To: References: Message-ID: > In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: > https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 > > Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. > > About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. > > And thanks to @eme64 who extracted such a concise reproducer. Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: mostly punctuation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23977/files - new: https://git.openjdk.org/jdk/pull/23977/files/6f8fce6e..6a149e23 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23977&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23977&range=01-02 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23977.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23977/head:pull/23977 PR: https://git.openjdk.org/jdk/pull/23977 From chagedorn at openjdk.org Thu Mar 20 14:39:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Mar 2025 14:39:08 GMT Subject: RFR: 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal [v3] In-Reply-To: References: Message-ID: <6fvSXrE35IkefkNNnCPFmUNAAwvc0rp7v-IJPZicAyo=.077e58d7-4c08-402b-b185-34311add83e1@github.com> On Thu, 20 Mar 2025 14:33:55 GMT, Marc Chevalier wrote: >> In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: >> https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 >> >> Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. >> >> About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. >> >> And thanks to @eme64 who extracted such a concise reproducer. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > mostly punctuation Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23977#pullrequestreview-2702996978 From duke at openjdk.org Thu Mar 20 14:39:08 2025 From: duke at openjdk.org (Marc Chevalier) Date: Thu, 20 Mar 2025 14:39:08 GMT Subject: RFR: 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal [v3] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 14:33:55 GMT, Marc Chevalier wrote: >> In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: >> https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 >> >> Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. >> >> About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. >> >> And thanks to @eme64 who extracted such a concise reproducer. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > mostly punctuation I applied your fixes. I don't think I need proper review again, but I need re-approval. EDIT: nevermind, @chhagedorn was faster than me. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23977#issuecomment-2740677836 PR Comment: https://git.openjdk.org/jdk/pull/23977#issuecomment-2740684260 From duke at openjdk.org Thu Mar 20 14:39:09 2025 From: duke at openjdk.org (duke) Date: Thu, 20 Mar 2025 14:39:09 GMT Subject: RFR: 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal [v3] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 14:33:55 GMT, Marc Chevalier wrote: >> In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: >> https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 >> >> Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. >> >> About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. >> >> And thanks to @eme64 who extracted such a concise reproducer. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > mostly punctuation @marc-chevalier Your change (at version 6a149e23f61fc61ec8175b8fef138b768b180015) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23977#issuecomment-2740686065 From thartmann at openjdk.org Thu Mar 20 14:40:09 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 20 Mar 2025 14:40:09 GMT Subject: RFR: 8352317: Assertion failure during size estimation of BoxLockNode with -XX:+UseAPX In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 06:50:39 GMT, Jatin Bhateja wrote: > This patch fixes a crash during PhaseOutput while estimating the size of BoxLockNode. > LEA instruction used to load the stack location holding a thread-specific lock in fast locking mode did not account for an additional byte of REX2 prefix if the destination register is an EGPR. > > LEA GPR/EGPR OFFSET(RSP). > > This fixed multiple issues seen in SPECjvm2008 worklets. > > The issue can be reproduced after changing the static allocation ordering in x86_64.ad giving preference to the EGPR register. > [JDK-8343294](https://bugs.openjdk.org/browse/JDK-8343294) tracks the requirement to randomize the allocation sequence. > > Kindly review and share your feedback. > > Best Regards, > Jatin Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24109#pullrequestreview-2703010555 From duke at openjdk.org Thu Mar 20 14:42:20 2025 From: duke at openjdk.org (David Linus Briemann) Date: Thu, 20 Mar 2025 14:42:20 GMT Subject: RFR: 8352512: TestVectorZeroCount: counter not reset between iterations Message-ID: 8352512: TestVectorZeroCount: counter not reset between iterations ------------- Commit messages: - 8352512: TestVectorZeroCount: counter not reset between iterations Changes: https://git.openjdk.org/jdk/pull/24134/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24134&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352512 Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24134.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24134/head:pull/24134 PR: https://git.openjdk.org/jdk/pull/24134 From mdoerr at openjdk.org Thu Mar 20 14:42:21 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 20 Mar 2025 14:42:21 GMT Subject: RFR: 8352512: TestVectorZeroCount: counter not reset between iterations In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 14:31:47 GMT, David Linus Briemann wrote: > 8352512: TestVectorZeroCount: counter not reset between iterations LGTM. Thanks for fixing it! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24134#pullrequestreview-2703010933 From thartmann at openjdk.org Thu Mar 20 15:19:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 20 Mar 2025 15:19:10 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v3] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 12:34:39 GMT, Roberto Casta?eda Lozano wrote: >> The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: >> >> >> for (int i = 0; i < array.length; i++) { >> array[i] = 0; >> } >> >> with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: >> >> >> arrayof_jint_fill(array, 0, array.length) >> >> >> The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). >> >> This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations. >> >> Assuming mismatched stores are discarded (as proposed here), an alternative solution would be to define a StoreS node returning the appropriate `memory_type()`. This could be desirable even as a complement to this fix, to prevent similar bugs in the future. I propose to investigate the introduction of a StoreS node in a separate RFE, because it is a much larger and more intrusive changeset, and go with this minimal, local, and non-intrusive fix for backportability. >> >> **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly disable optimization for mismatching stores; add positive and negative tests Good catch and nice tests! The fix looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24005#pullrequestreview-2703151758 From chagedorn at openjdk.org Thu Mar 20 15:22:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Mar 2025 15:22:13 GMT Subject: RFR: 8352512: TestVectorZeroCount: counter not reset between iterations In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 14:31:47 GMT, David Linus Briemann wrote: > 8352512: TestVectorZeroCount: counter not reset between iterations Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24134#pullrequestreview-2703161822 From qamai at openjdk.org Thu Mar 20 15:38:09 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 20 Mar 2025 15:38:09 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v3] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 12:34:39 GMT, Roberto Casta?eda Lozano wrote: >> The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: >> >> >> for (int i = 0; i < array.length; i++) { >> array[i] = 0; >> } >> >> with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: >> >> >> arrayof_jint_fill(array, 0, array.length) >> >> >> The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). >> >> This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations. >> >> Assuming mismatched stores are discarded (as proposed here), an alternative solution would be to define a StoreS node returning the appropriate `memory_type()`. This could be desirable even as a complement to this fix, to prevent similar bugs in the future. I propose to investigate the introduction of a StoreS node in a separate RFE, because it is a much larger and more intrusive changeset, and go with this minimal, local, and non-intrusive fix for backportability. >> >> **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly disable optimization for mismatching stores; add positive and negative tests @robcasloz My concern is that `MemNode::memory_type` does not do what it seems to do. I wonder if there are other places misusing this method. The concern is orthogonal to this issue, though. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/24005#pullrequestreview-2703219847 From sviswanathan at openjdk.org Thu Mar 20 15:42:17 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 20 Mar 2025 15:42:17 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 17:50:20 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add comments for encoding and UCF We are looking to integrate this PR by next week. @vnkozlov @eme64 could you please run this PR through your testing if possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2740888471 From never at openjdk.org Thu Mar 20 15:47:10 2025 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 20 Mar 2025 15:47:10 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> <0pScaTfNpM4LYdukKYZIdk1BluQVvUM2BdssI4bjisI=.d92d627f-ce02-406c-bed3-38d31692c2ef@github.com> Message-ID: On Thu, 20 Mar 2025 01:33:58 GMT, Dean Long wrote: >> I agree with Vladimir that it seems like something is wrong with the block probabilities. In product it would be fine to simply clamp these values in the range of 0..100 since they are just used to compute `CFGEdge::_infrequent` so the worst thing you get is a less good layout. Refactoring the expressions so it's more clear what the requirements wouldn't hurt either. > > There may be a bug in frequency propagation. I don't understand the connector/non-connector logic, but when I reproduce this, the successor has a loop block with high _freq, but then we use non_connector_successor() to get the successor, and that gives us instead a different block which originally had 0 _freq, but got changed to MIN_BLOCK_FREQUENCY by CFGLoop::scale_freq(). Could the problem be inconsistent use of non_connector_successor? Isn't there some logic which forces deopt blocks to have low frequencies? It it's only setting the non_connector_successor frequency then you'd get inconsistent results if you didn't use non_connector_successor for examining successors. Maybe it should be adjusting the frequency of all blocks leading to the non_connector_successor? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r2005942687 From duke at openjdk.org Thu Mar 20 15:51:53 2025 From: duke at openjdk.org (Marc Chevalier) Date: Thu, 20 Mar 2025 15:51:53 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v16] In-Reply-To: References: Message-ID: > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Rephrase comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/e3ecf350..124d9382 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=14-15 Stats: 52 lines in 1 file changed: 18 ins; 6 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From duke at openjdk.org Thu Mar 20 15:51:53 2025 From: duke at openjdk.org (Marc Chevalier) Date: Thu, 20 Mar 2025 15:51:53 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v15] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 10:58:14 GMT, Marc Chevalier wrote: >> This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. >> >> Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. >> >> This also works for multiplications by powers of 2 since they are already translated into shifts. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > more checks Fixed all the comments. You can do the next iteration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23728#issuecomment-2740920191 From duke at openjdk.org Thu Mar 20 15:53:20 2025 From: duke at openjdk.org (Marc Chevalier) Date: Thu, 20 Mar 2025 15:53:20 GMT Subject: Integrated: 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal In-Reply-To: References: Message-ID: <2nq9AH3P1YUoTERAd5cadxvD9VJtgkmEXY7PooNdtQU=.768661b9-697d-42f2-a6ce-d3241f1fc18e@github.com> On Tue, 11 Mar 2025 09:12:44 GMT, Marc Chevalier wrote: > In CCP, we transform the nodes going up (toward inputs) starting from root and safepoints because infinite loops can be reachable from the root, but not co-reachable from the root, that is one can follow def-use from root to the loop, but not the use-def from root to loop. For more details, see: > https://github.com/openjdk/jdk/blob/4cf63160ad575d49dbe70f128cd36aba22b8f2ff/src/hotspot/share/opto/phaseX.cpp#L2063-L2070 > > Since we are specifically marking nodes as useful if they are above a safepoint, the check that no dead nodes must be there anymore must also consider nodes above a safepoint as alive: the same criterion must apply. We should nevertheless not start from a safepoint killed by CCP. > > About the test, I use this trick found in `TestInfiniteLoopCCP` because I indeed need a really infinite loop, but I want a terminating test. The crash is not deterministic, as it needs StressIGVN, so I did a bit of stats. Using a little helper script, on 100 runs, 69 runs fail as in the JBS ticket and 31 are successful (so 0 fail in another way). After the fix, I find 100 successes. > > And thanks to @eme64 who extracted such a concise reproducer. This pull request has now been integrated. Changeset: 2bc4f64c Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/2bc4f64c56ebc844d494a4ce8ba72a25643d4075 Stats: 114 lines in 5 files changed: 98 ins; 0 del; 16 mod 8335708: C2: Compile::verify_graph_edges must start at root and safepoints, just like CCP traversal Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/23977 From cushon at openjdk.org Thu Mar 20 17:08:09 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 20 Mar 2025 17:08:09 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v6] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 16:18:33 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java > > Co-authored-by: Christian Hagedorn >From Matthias Thank you Christian for that reproducer! Unless you've already tried, I think I'll first try and verify whether it's an already present issue at HEAD and this fix is simply incomplete (related to the CCP vs IGVN discussion on https://github.com/openjdk/jdk/pull/22856#issuecomment-2689917878) or whether it's being caused by this fix PR. I suspect it might be the former, in which case the question would be whether we want to go ahead with this fix for the CCP worklist to reduce noise? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2741132640 From kvn at openjdk.org Thu Mar 20 17:38:29 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Mar 2025 17:38:29 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 10:46:10 GMT, Boris Ulasevich wrote: >> How about this?: >> >> +++ b/src/hotspot/share/code/relocInfo.cpp >> @@ -117,6 +117,8 @@ void relocInfo::change_reloc_info_for_address(RelocIterator *itr, address pc, re >> // Implementation of RelocIterator >> >> +static relocInfo dummy_reloc[2]; >> + >> void RelocIterator::initialize(nmethod* nm, address begin, address limit) { >> initialize_misc(); >> >> @@ -127,8 +129,14 @@ void RelocIterator::initialize(nmethod* nm, address begin, address limit) { >> guarantee(nm != nullptr, "must be able to deduce nmethod from other arguments"); >> >> _code = nm; >> - _current = nm->relocation_begin() - 1; >> - _end = nm->relocation_end(); >> + // Check for no relocations case and use dummy data to avoid referencing wrong data. >> + if (nm->relocation_size() == 0) { >> + _current = dummy_reloc; >> + _end = dummy_reloc + 1; >> + } else { >> + _current = nm->relocation_begin() - 1; >> + _end = nm->relocation_end(); >> + } >> _addr = nm->content_begin(); >> >> // Initialize code sections. >> >> >> I filed RFE: [JDK-8352426](https://bugs.openjdk.org/browse/JDK-8352426) > > We can just add nullptr checks before pointer arithmetic in relocInfo: > > diff --git a/src/hotspot/share/code/relocInfo.cpp b/src/hotspot/share/code/relocInfo.cpp > index 7aae32759dd..c694f21e5ca 100644 > --- a/src/hotspot/share/code/relocInfo.cpp > +++ b/src/hotspot/share/code/relocInfo.cpp > @@ -127,7 +127,8 @@ void RelocIterator::initialize(nmethod* nm, address begin, address limit) { > guarantee(nm != nullptr, "must be able to deduce nmethod from other arguments"); > > _code = nm; > - _current = nm->relocation_begin() - 1; > + _current = nm->relocation_begin(); > + if (_current != nullptr) { _current--; } > _end = nm->relocation_end(); > _addr = nm->content_begin(); > > diff --git a/src/hotspot/share/code/relocInfo.hpp b/src/hotspot/share/code/relocInfo.hpp > index 25cca49e50b..b440e713493 100644 > --- a/src/hotspot/share/code/relocInfo.hpp > +++ b/src/hotspot/share/code/relocInfo.hpp > @@ -603,7 +603,7 @@ class RelocIterator : public StackObj { > > // get next reloc info, return !eos > bool next() { > - _current++; > + if (_current != nullptr) { _current++; } > assert(_current <= _end, "must not overrun relocInfo"); > if (_current == _end) { > set_has_current(false); I think we should not add additional check to `next()` - it is performance critical. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24102#discussion_r2006127495 From mli at openjdk.org Thu Mar 20 17:40:31 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 20 Mar 2025 17:40:31 GMT Subject: RFR: 8352529: RISC-V: enable loopopts tests Message-ID: Hi, Can you help to review this patch? This is also a follow-up of https://github.com/openjdk/jdk/pull/23985. There're bunch of test under `test/hotspot/jtreg/compiler/loopopts/` could be enabled for riscv. There are some failures for some tests after enabling them, I'll investigate further and send out separate pr after fixing them. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24138/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24138&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352529 Stats: 190 lines in 10 files changed: 0 ins; 0 del; 190 mod Patch: https://git.openjdk.org/jdk/pull/24138.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24138/head:pull/24138 PR: https://git.openjdk.org/jdk/pull/24138 From duke at openjdk.org Thu Mar 20 17:49:06 2025 From: duke at openjdk.org (David Linus Briemann) Date: Thu, 20 Mar 2025 17:49:06 GMT Subject: RFR: 8352512: TestVectorZeroCount: counter not reset between iterations In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 14:31:47 GMT, David Linus Briemann wrote: > 8352512: TestVectorZeroCount: counter not reset between iterations Thank you both for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24134#issuecomment-2741230368 From duke at openjdk.org Thu Mar 20 17:49:06 2025 From: duke at openjdk.org (duke) Date: Thu, 20 Mar 2025 17:49:06 GMT Subject: RFR: 8352512: TestVectorZeroCount: counter not reset between iterations In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 14:31:47 GMT, David Linus Briemann wrote: > 8352512: TestVectorZeroCount: counter not reset between iterations @dbriemann Your change (at version a4588f186b2048d01c8ed283b39635f0f6558b68) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24134#issuecomment-2741233430 From kvn at openjdk.org Thu Mar 20 17:59:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Mar 2025 17:59:08 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer In-Reply-To: References: Message-ID: <1L-ScFxwlky9Z_oCqPs2nQEcytWSbYseVLKnaiPoCao=.b190edda-5d41-4ced-8e56-dc7c8fbfc2be@github.com> On Wed, 19 Mar 2025 15:43:54 GMT, Doug Simon wrote: > This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. src/hotspot/share/asm/codeBuffer.hpp line 550: > 548: initialize_misc(name); > 549: _total_start = 0; > 550: _total_size = 0; May be we should move this initialization from `initialize()` to `initialize_misc()` so you don't need to do this here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24118#discussion_r2006175075 From rcastanedalo at openjdk.org Thu Mar 20 17:59:07 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 20 Mar 2025 17:59:07 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> <0pScaTfNpM4LYdukKYZIdk1BluQVvUM2BdssI4bjisI=.d92d627f-ce02-406c-bed3-38d31692c2ef@github.com> Message-ID: On Thu, 20 Mar 2025 15:44:31 GMT, Tom Rodriguez wrote: >> There may be a bug in frequency propagation. I don't understand the connector/non-connector logic, but when I reproduce this, the successor has a loop block with high _freq, but then we use non_connector_successor() to get the successor, and that gives us instead a different block which originally had 0 _freq, but got changed to MIN_BLOCK_FREQUENCY by CFGLoop::scale_freq(). > > Could the problem be inconsistent use of non_connector_successor? Isn't there some logic which forces deopt blocks to have low frequencies? It it's only setting the non_connector_successor frequency then you'd get inconsistent results if you didn't use non_connector_successor for examining successors. Maybe it should be adjusting the frequency of all blocks leading to the non_connector_successor? > There may be a bug in frequency propagation. I haven't looked at these failures but could it have to do with irreducible loops? This is a known reason of frequency propagation inaccuracy in C2, see [JDK-8258895](https://bugs.openjdk.org/browse/JDK-8258895). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r2006175477 From kvn at openjdk.org Thu Mar 20 18:03:09 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Mar 2025 18:03:09 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer In-Reply-To: <1L-ScFxwlky9Z_oCqPs2nQEcytWSbYseVLKnaiPoCao=.b190edda-5d41-4ced-8e56-dc7c8fbfc2be@github.com> References: <1L-ScFxwlky9Z_oCqPs2nQEcytWSbYseVLKnaiPoCao=.b190edda-5d41-4ced-8e56-dc7c8fbfc2be@github.com> Message-ID: On Thu, 20 Mar 2025 17:56:23 GMT, Vladimir Kozlov wrote: >> This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. > > src/hotspot/share/asm/codeBuffer.hpp line 550: > >> 548: initialize_misc(name); >> 549: _total_start = 0; >> 550: _total_size = 0; > > May be we should move this initialization from `initialize()` to `initialize_misc()` so you don't need to do this here. Otherwise following constructor also doesn't set them. `initialize(csize_t code_size, csize_t locs_size)` does not. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24118#discussion_r2006181928 From jbhateja at openjdk.org Thu Mar 20 18:59:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 20 Mar 2025 18:59:11 GMT Subject: RFR: 8352317: Assertion failure during size estimation of BoxLockNode with -XX:+UseAPX In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 14:37:11 GMT, Tobias Hartmann wrote: >> This patch fixes a crash during PhaseOutput while estimating the size of BoxLockNode. >> LEA instruction used to load the stack location holding a thread-specific lock in fast locking mode did not account for an additional byte of REX2 prefix if the destination register is an EGPR. >> >> LEA GPR/EGPR OFFSET(RSP). >> >> This fixed multiple issues seen in SPECjvm2008 worklets. >> >> The issue can be reproduced after changing the static allocation ordering in x86_64.ad giving preference to the EGPR register. >> [JDK-8343294](https://bugs.openjdk.org/browse/JDK-8343294) tracks the requirement to randomize the allocation sequence. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Looks reasonable to me. Thanks @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/24109#issuecomment-2741393546 From jbhateja at openjdk.org Thu Mar 20 18:59:12 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 20 Mar 2025 18:59:12 GMT Subject: Integrated: 8352317: Assertion failure during size estimation of BoxLockNode with -XX:+UseAPX In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 06:50:39 GMT, Jatin Bhateja wrote: > This patch fixes a crash during PhaseOutput while estimating the size of BoxLockNode. > LEA instruction used to load the stack location holding a thread-specific lock in fast locking mode did not account for an additional byte of REX2 prefix if the destination register is an EGPR. > > LEA GPR/EGPR OFFSET(RSP). > > This fixed multiple issues seen in SPECjvm2008 worklets. > > The issue can be reproduced after changing the static allocation ordering in x86_64.ad giving preference to the EGPR register. > [JDK-8343294](https://bugs.openjdk.org/browse/JDK-8343294) tracks the requirement to randomize the allocation sequence. > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 56038fb5 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/56038fb5a156568cce2e80f5db18b10ad61c06e4 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod 8352317: Assertion failure during size estimation of BoxLockNode with -XX:+UseAPX Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24109 From dlong at openjdk.org Thu Mar 20 19:28:08 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 20 Mar 2025 19:28:08 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> <0pScaTfNpM4LYdukKYZIdk1BluQVvUM2BdssI4bjisI=.d92d627f-ce02-406c-bed3-38d31692c2ef@github.com> Message-ID: On Thu, 20 Mar 2025 17:56:36 GMT, Roberto Casta?eda Lozano wrote: >> Could the problem be inconsistent use of non_connector_successor? Isn't there some logic which forces deopt blocks to have low frequencies? It it's only setting the non_connector_successor frequency then you'd get inconsistent results if you didn't use non_connector_successor for examining successors. Maybe it should be adjusting the frequency of all blocks leading to the non_connector_successor? > >> There may be a bug in frequency propagation. > > I haven't looked at these failures but could it have to do with irreducible loops? This is a known reason of frequency propagation inaccuracy in C2, see [JDK-8258895](https://bugs.openjdk.org/browse/JDK-8258895). @robcasloz , that could be it, but I'm not enough of an expert. It seems to be this loop: https://github.com/openjdk/jdk/blob/56038fb5a156568cce2e80f5db18b10ad61c06e4/test/jdk/java/foreign/TestHandshake.java#L104 and the block gets detected as empty and moved to the end by PhaseCFG::remove_empty_blocks(). Roberto, do you have time to help look at this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r2006312398 From dnsimon at openjdk.org Thu Mar 20 20:06:49 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 20 Mar 2025 20:06:49 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer [v2] In-Reply-To: References: Message-ID: <4lZVM2kTFAD5ybpReIHEtTEovOvlgiG4qXDMM4Q8PS8=.32431f2d-7f48-4b1d-abf8-df762a7f839d@github.com> > This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - initialize _total_start with nullptr instead of 0 - moved initialization of _total_start and _total_size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24118/files - new: https://git.openjdk.org/jdk/pull/24118/files/bbab41b7..15d90178 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24118&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24118&range=00-01 Stats: 5 lines in 2 files changed: 2 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24118.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24118/head:pull/24118 PR: https://git.openjdk.org/jdk/pull/24118 From dnsimon at openjdk.org Thu Mar 20 20:06:49 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 20 Mar 2025 20:06:49 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer [v2] In-Reply-To: References: <1L-ScFxwlky9Z_oCqPs2nQEcytWSbYseVLKnaiPoCao=.b190edda-5d41-4ced-8e56-dc7c8fbfc2be@github.com> Message-ID: <3IoECs1CM1ij6bjxVqEHGq_GLsHnFzmz_hvWv1Poptg=.2ad96946-f9f7-4a95-a1f0-c2466f93b32d@github.com> On Thu, 20 Mar 2025 18:00:56 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/asm/codeBuffer.hpp line 550: >> >>> 548: initialize_misc(name); >>> 549: _total_start = 0; >>> 550: _total_size = 0; >> >> May be we should move this initialization from `initialize()` to `initialize_misc()` so you don't need to do this here. > > Otherwise following constructor also doesn't set them. `initialize(csize_t code_size, csize_t locs_size)` does not. Ok, moved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24118#discussion_r2006360524 From dhanalla at openjdk.org Thu Mar 20 20:08:10 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Thu, 20 Mar 2025 20:08:10 GMT Subject: RFR: 8350609: cleanup unknown unwind opcode (0xB) for windows In-Reply-To: <4BUZBMlLC1zPnsieDNsbkji0xcmHLd_VHRN3bEhpJ3A=.5d2b315c-20de-4770-93e1-846cd733cde0@github.com> References: <4BUZBMlLC1zPnsieDNsbkji0xcmHLd_VHRN3bEhpJ3A=.5d2b315c-20de-4770-93e1-846cd733cde0@github.com> Message-ID: On Fri, 28 Feb 2025 18:42:11 GMT, Vivek Deshpande wrote: > Did you get to test these functions for correctness, possibly using jtreg or some other tests ? Yes, we have validated the Jtreg Tier 1 tests, including vector-specific tests under /test/jdk/incubator/vector. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23707#issuecomment-2741542004 From dhanalla at openjdk.org Thu Mar 20 20:16:09 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Thu, 20 Mar 2025 20:16:09 GMT Subject: RFR: 8350609: Cleanup unknown unwind opcode (0xB) for windows In-Reply-To: References: <4BUZBMlLC1zPnsieDNsbkji0xcmHLd_VHRN3bEhpJ3A=.5d2b315c-20de-4770-93e1-846cd733cde0@github.com> Message-ID: On Thu, 20 Mar 2025 20:05:35 GMT, Dhamoder Nalla wrote: >> Did you get to test these functions for correctness, possibly using jtreg or some other tests ? > >> Did you get to test these functions for correctness, possibly using jtreg or some other tests ? > > Yes, we have validated the Jtreg Tier 1 tests, including vector-specific tests under /test/jdk/incubator/vector. > @dhanalla As @vivdesh asked above: do you have a regression test for this? > > You also have this warning above: > > Warning ?? Found leading lowercase letter in issue title for 8350609: cleanup unknown unwind opcode (0xB) for windows I addressed the warning, and regarding regression tests, I have validated the Jtreg Tier 1 tests, including vector-specific tests under /test/jdk/incubator/vector. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23707#issuecomment-2741557396 From jbhateja at openjdk.org Thu Mar 20 20:24:44 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 20 Mar 2025 20:24:44 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v3] In-Reply-To: References: Message-ID: > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Review comments resolutions. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Updating benchmark - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Updating copyright - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Add MinVHF/MaxVHF to commutative op list - Auto Vectorization support for Float16 operations. ------------- Changes: https://git.openjdk.org/jdk/pull/22755/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=02 Stats: 966 lines in 18 files changed: 902 ins; 10 del; 54 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From kvn at openjdk.org Thu Mar 20 20:48:17 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Mar 2025 20:48:17 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer [v2] In-Reply-To: <4lZVM2kTFAD5ybpReIHEtTEovOvlgiG4qXDMM4Q8PS8=.32431f2d-7f48-4b1d-abf8-df762a7f839d@github.com> References: <4lZVM2kTFAD5ybpReIHEtTEovOvlgiG4qXDMM4Q8PS8=.32431f2d-7f48-4b1d-abf8-df762a7f839d@github.com> Message-ID: <5t0YeG1y5BKm0Zgliihna1AB1NXWwTrJJMFMlC5Res0=.4b80473c-e58d-4ef1-a321-1aa27bdcc19f@github.com> On Thu, 20 Mar 2025 20:06:49 GMT, Doug Simon wrote: >> This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - initialize _total_start with nullptr instead of 0 > - moved initialization of _total_start and _total_size And you don't need them in [initialize(address code_start, csize_t code_size)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.hpp#L487) ------------- PR Review: https://git.openjdk.org/jdk/pull/24118#pullrequestreview-2704068027 From dlong at openjdk.org Thu Mar 20 22:24:07 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 20 Mar 2025 22:24:07 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> <0pScaTfNpM4LYdukKYZIdk1BluQVvUM2BdssI4bjisI=.d92d627f-ce02-406c-bed3-38d31692c2ef@github.com> Message-ID: On Thu, 20 Mar 2025 19:25:14 GMT, Dean Long wrote: >>> There may be a bug in frequency propagation. >> >> I haven't looked at these failures but could it have to do with irreducible loops? This is a known reason of frequency propagation inaccuracy in C2, see [JDK-8258895](https://bugs.openjdk.org/browse/JDK-8258895). > > @robcasloz , that could be it, but I'm not enough of an expert. It seems to be this loop: > https://github.com/openjdk/jdk/blob/56038fb5a156568cce2e80f5db18b10ad61c06e4/test/jdk/java/foreign/TestHandshake.java#L104 > and the block gets detected as empty and moved to the end by PhaseCFG::remove_empty_blocks(). Roberto, do you have time to help look at this? What I'm seeing is a block with 2 NeverBranchNode successors, which get a _freq of 0 (later changed to MIN_BLOCK_FREQUENCY). B64: # out( B66 B65 ) <- in( N298 N301 ) Freq: 0 148 Loop === 148 297 296 [[ 148 173 174 147 ]] inner !jvms: TestHandshake$AbstractSegmentAccessor::run @ bci:12 (line 104) 147 NeverBranch === 148 [[ 172 146 ]] 172 CProj === 147 [[ 301 ]] #0 146 CProj === 147 [[ 295 ]] #1 Initially these successor blocks are mostly empty: B66: # out( B64 ) <- in( N148 ) Freq: 0 301 Region === 301 172 [[ 301 170 ]] 296 branch === 166 [[ 148 ]] !orig=230 B65: # out( B1 ) <- in( N148 ) Freq: 0 295 Region === 295 146 [[ 295 145 ]] 145 ShouldNotReachHere === 295 0 0 19 0 [[ 1 ]] but later we add nodes to B66 for some reason. Also B64 is detected as empty are moved to the end. At some point the NeverBranch is changed into a branchNode MachGotoNode, and that causes succ_prob() to return 1 instead of 0 for these edges. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r2006542741 From dlong at openjdk.org Thu Mar 20 22:33:06 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 20 Mar 2025 22:33:06 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: <6APmJgwNW3SFayLBSOzKwUzd2ORsrBQK7wE0CnX1_gY=.ffbbef06-c288-4cee-a3ec-cda5b94c23a1@github.com> <0pScaTfNpM4LYdukKYZIdk1BluQVvUM2BdssI4bjisI=.d92d627f-ce02-406c-bed3-38d31692c2ef@github.com> Message-ID: On Thu, 20 Mar 2025 22:21:53 GMT, Dean Long wrote: >> @robcasloz , that could be it, but I'm not enough of an expert. It seems to be this loop: >> https://github.com/openjdk/jdk/blob/56038fb5a156568cce2e80f5db18b10ad61c06e4/test/jdk/java/foreign/TestHandshake.java#L104 >> and the block gets detected as empty and moved to the end by PhaseCFG::remove_empty_blocks(). Roberto, do you have time to help look at this? > > What I'm seeing is a block with 2 NeverBranchNode successors, which get a _freq of 0 (later changed to MIN_BLOCK_FREQUENCY). > > B64: # out( B66 B65 ) <- in( N298 N301 ) Freq: 0 > 148 Loop === 148 297 296 [[ 148 173 174 147 ]] inner !jvms: TestHandshake$AbstractSegmentAccessor::run @ bci:12 (line 104) > 147 NeverBranch === 148 [[ 172 146 ]] > 172 CProj === 147 [[ 301 ]] #0 > 146 CProj === 147 [[ 295 ]] #1 > > Initially these successor blocks are mostly empty: > > B66: # out( B64 ) <- in( N148 ) Freq: 0 > 301 Region === 301 172 [[ 301 170 ]] > 296 branch === 166 [[ 148 ]] !orig=230 > > > B65: # out( B1 ) <- in( N148 ) Freq: 0 > 295 Region === 295 146 [[ 295 145 ]] > 145 ShouldNotReachHere === 295 0 0 19 0 [[ 1 ]] > > but later we add nodes to B66 for some reason. Also B64 is detected as empty are moved to the end. At some point the NeverBranch is changed into a branchNode MachGotoNode, and that causes succ_prob() to return 1 instead of 0 for these edges. According to the comment for PhaseCFG::convert_NeverBranch_to_Goto(), this does mean it's an infinite loop. This means the target block probability should be 1.0, right? But we forced it to 0.0 when we saw the NeverBranch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23962#discussion_r2006549929 From sviswanathan at openjdk.org Thu Mar 20 23:55:08 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 20 Mar 2025 23:55:08 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v3] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 20:24:44 GMT, Jatin Bhateja wrote: >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 >> >> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). >> >> Summary of changes included with the patch: >> 1. C2 compiler New Vector IR creation. >> 2. Auto-vectorization support. >> 3. x86 backend implementation. >> 4. New IR verification test for each newly supported vector operation. >> >> Following are the performance numbers of Float16OperationsBenchmark >> >> System : Intel(R) Xeon(R) Processor code-named Granite rapids >> Frequency fixed at 2.5 GHz >> >> >> Baseline >> Benchmark (vectorDim) Mode Cnt Score Error Units >> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms >> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms >> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms >> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms >> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms >> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms >> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms >> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms >> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms >> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms >> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms >> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms >> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms >> Float16OperationsBenchmark.isInfiniteCMovBen... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Review comments resolutions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 > - Updating benchmark > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 > - Updating copyright > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 > - Add MinVHF/MaxVHF to commutative op list > - Auto Vectorization support for Float16 operations. test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 62: > 60: return !expected_fp16.equals(actual_fp16); > 61: } > 62: return false; This should be reverse: if (isNaN(expected_fp16) ^ isNaN(actual_fp16)) { return false; } return !expected_fp16.equals(actual_fp16); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2006621788 From fyang at openjdk.org Fri Mar 21 00:27:06 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Mar 2025 00:27:06 GMT Subject: RFR: 8352529: RISC-V: enable loopopts tests In-Reply-To: References: Message-ID: <8ytIDwSsn2ZF0WRCCS5DcdolHq-qn1a9do0Rib8oSjk=.ccfe5d39-0967-4056-98ba-d04c8952f394@github.com> On Thu, 20 Mar 2025 17:28:08 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? This is also a follow-up of https://github.com/openjdk/jdk/pull/23985. > There're bunch of test under `test/hotspot/jtreg/compiler/loopopts/` could be enabled for riscv. > > There are some failures for some tests after enabling them, I'll investigate further and send out separate pr after fixing them. > > Thanks! Looks fine to me. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24138#pullrequestreview-2704431371 From jbhateja at openjdk.org Fri Mar 21 06:30:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 21 Mar 2025 06:30:02 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v4] In-Reply-To: References: Message-ID: <8J4SAkIC2XVaD4cwGfK7LkJzz2IV7WIrjwbCjhBquXM=.a8f5276f-ac25-465a-9d3d-f23dd3f0ec0e@github.com> > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains seven commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Updating benchmark - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Updating copyright - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Add MinVHF/MaxVHF to commutative op list - Auto Vectorization support for Float16 operations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22755/files - new: https://git.openjdk.org/jdk/pull/22755/files/784a4ecd..2f0dac54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=02-03 Stats: 126 lines in 4 files changed: 3 ins; 104 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From dlunden at openjdk.org Fri Mar 21 07:19:11 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 21 Mar 2025 07:19:11 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v8] In-Reply-To: <2uVU0x7eKKHHDLTEAftG-e9qUSMG95Z968T3uyMsKDc=.f24889f5-098b-4b7b-9943-a794564b69f0@github.com> References: <6l8orDGDTI-ADWxEmDjMPX1uorIhxLd3T55s0eIzJ3I=.0cb9d2c8-4302-408f-b64e-dc9a8e3d4145@github.com> <2uVU0x7eKKHHDLTEAftG-e9qUSMG95Z968T3uyMsKDc=.f24889f5-098b-4b7b-9943-a794564b69f0@github.com> Message-ID: On Thu, 20 Mar 2025 13:52:55 GMT, Emanuel Peter wrote: >> Keep alive > > @dlunde What's the state with this one? Are you looking for reviews? @eme64: Yes, looking for reviews! But, let us check with @robcasloz before you start a review. He mentioned he was also going to review this (and has partially reviewed and contributed to the changeset already), and since it is quite a large changeset it would be good to coordinate our efforts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2742545509 From chagedorn at openjdk.org Fri Mar 21 07:52:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 21 Mar 2025 07:52:12 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v6] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 16:18:33 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java > > Co-authored-by: Christian Hagedorn I've tried it with latest master but could not reproduce it. Adding nodes to the CCP worklist should not directly be an issue. So, I expect this patch just revealed an existing issue with JDK-8346664. > I suspect it might be the former, in which case the question would be whether we want to go ahead with this fix for the CCP worklist to reduce noise? Yes, I guess we can continue with this patch to reduce the noise in the CI and follow up with another bug fix to address the problem I reported. Can you file a bug accordingly? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2742596698 From epeter at openjdk.org Fri Mar 21 08:27:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 08:27:08 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 13:27:00 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/opto/node.cpp line 3063: > 3061: // be unreachable as using a dead value makes no sense. For the Type node to capture a narrowed down type, some control > 3062: // flow construct must guard the Type node (an If node usually). When the Type node becomes dead, the guard usually > 3063: // constant fold and the control flow that leads to the Type node becomes unreachable. There are cases where that doesn't Suggestion: // constant folds and the control flow that leads to the Type node becomes unreachable. There are cases where that doesn't src/hotspot/share/opto/node.cpp line 3100: > 3098: loop->register_new_node(frame, igvn->C->start()); > 3099: } > 3100: Node* halt = new HaltNode(c, frame, "dead path discovered by TypeNode"); The more info we can attach to the `HaltNode`, the better. It would make debugging easier if it is ever hit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2005223998 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2005226292 From epeter at openjdk.org Fri Mar 21 08:43:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 08:43:10 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 08:16:20 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Thanks Roberto for the evaluation! That looks promising. I've discussed this with @chhagedorn in the office yesterday. Here what I remember from that conversation. - This patch here solves some real issues: multiple conditions together are impossible, but it is difficult to prove that directly from the conditions, i.e. it would almost take some theorem prover to show that the branch is impossible. But sometimes the data nodes come to an empty type. - There are also cases where we could have easily enabled constant folding for the branch, and we are just suffering from data nodes being smarter than CFG nodes. Your patch would "fix" these bugs, but then we still have a branch in the code that in theory could have been folded, but was not. This may be performance wise slightly suboptimal, and we probably will struggle to find these cases. But most likely, the performance there is not critical. - There is a bit of a question about how much overhead adding in the HaltNodes produces. It seems on Roberto's benchmark there was no noticeable difference. That's good. Still, this is only one benchmark, and I'm wondering what could happen worst-case. If the graph was slowly dying from the bottom up, could it happen that we essencially gradually replace every CFG node with a HaltNode, as the dying slowly propagates up the whole graph? Maybe that's still not terrible enough to do something smarter yet. If we find such a case, we could still think about it later. - Should we keep the HaltNodes in the graph? The question here is if we assume that: - these HaltNodes should never be taken, because the data nodes have proven that the path is impossible? Then we could actually just constant fold the if before the HaltNode. - these HaltNodes may be taken, because maybe there is a bug and only that led to the constant folding of the data node. Hitting a HaltNode would be a proof of a bug, and so we should keep them. It is better to crash the program than to continue in an inconsistent state down the wrong branch. Deopt would have been desirable, but restoring the state is probably near impossible. - And we have to be aware that we will now not learn about bugs the old "bad graph" way, but the graph will just be cleaned up with HaltNodes, which hides those bugs. Maybe that is ok, because those bugs are annoying and mostly happen in dead code anyway, so maybe we were wasting our time on those? Not sure, maybe there were also some real bugs mixed in there... and it would be a shame to miss those. TLDR: I'm in favour of this patch. There are some risks, but the benefits are probably worth it. There would have to be some more documentation, probably. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2742688738 From duke at openjdk.org Fri Mar 21 08:46:09 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 21 Mar 2025 08:46:09 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v6] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 13:26:51 GMT, Marc Chevalier wrote: >> Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! >> >> Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might >> be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. >> >> To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. >> >> The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: >> >> "precise .*" + IS_REPLACED + ":" >> >> which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: >> >> "allocationKlass:.*\\b" + IS_REPLACED + "\\s" >> >> which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. It can also not be extended on the right into `MyClassButNotReally` because of the space character that must exist after the searched string. >> >> The case of array allocations is slightly more tricky, but essentially similar. >> >> It is not quite fool-proof since a package path can still be extended, e.g. >> >> @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) >> >> will also match allocations of `a/prefix/some/package/MyClass`. >> >> I think it's an acceptable limitation. >> >> Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) !jvms: Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) >> >> and now it is >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/l... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright year Thanks for reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24093#issuecomment-2742693410 From duke at openjdk.org Fri Mar 21 08:46:09 2025 From: duke at openjdk.org (duke) Date: Fri, 21 Mar 2025 08:46:09 GMT Subject: RFR: 8314999: IR framework fails to detect allocation [v6] In-Reply-To: References: Message-ID: <0_w5dDBl36_uKOLGeAN9khusDXwJ-4VVhNNeRgZwfu4=.460e54a3-e512-41be-90cf-7af6ec3cb95c@github.com> On Wed, 19 Mar 2025 13:26:51 GMT, Marc Chevalier wrote: >> Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! >> >> Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might >> be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. >> >> To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. >> >> The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: >> >> "precise .*" + IS_REPLACED + ":" >> >> which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: >> >> "allocationKlass:.*\\b" + IS_REPLACED + "\\s" >> >> which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. It can also not be extended on the right into `MyClassButNotReally` because of the space character that must exist after the searched string. >> >> The case of array allocations is slightly more tricky, but essentially similar. >> >> It is not quite fool-proof since a package path can still be extended, e.g. >> >> @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) >> >> will also match allocations of `a/prefix/some/package/MyClass`. >> >> I think it's an acceptable limitation. >> >> Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) !jvms: Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) >> >> and now it is >> >> 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/l... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright year @marc-chevalier Your change (at version d1922cf4b704d055abb0496c9a091a9f2c782be9) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24093#issuecomment-2742695435 From epeter at openjdk.org Fri Mar 21 08:48:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 08:48:18 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 13:27:00 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review I know you explained it above in text form, but there should be an explanation at `make_paths_from_here_dead` to say why we cannot just put the `HaltNode` at the ctrl of the `CastII` for example. I think the idea is that the CastII could have its control hoisted, and so not all paths dominated by that control are actually dead... only those that use the CastII. It could be nice to have some concrete example, maybe some ASCII art showing the dead and live paths? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2742699975 From jbhateja at openjdk.org Fri Mar 21 08:55:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 21 Mar 2025 08:55:01 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v5] In-Reply-To: References: Message-ID: > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22755/files - new: https://git.openjdk.org/jdk/pull/22755/files/2f0dac54..1963d4b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=03-04 Stats: 285 lines in 8 files changed: 219 ins; 19 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From jbhateja at openjdk.org Fri Mar 21 08:55:03 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 21 Mar 2025 08:55:03 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v3] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 23:52:19 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. > > test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 62: > >> 60: return !expected_fp16.equals(actual_fp16); >> 61: } >> 62: return false; > > This should be reverse: > if (isNaN(expected_fp16) ^ isNaN(actual_fp16)) { > return false; > } > return !expected_fp16.equals(actual_fp16); My bad. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2007111082 From duke at openjdk.org Fri Mar 21 08:57:16 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 21 Mar 2025 08:57:16 GMT Subject: Integrated: 8314999: IR framework fails to detect allocation In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:43:22 GMT, Marc Chevalier wrote: > Bring the `ALLOC(_ARRAY)?(_OF)?` IR framework regex in the modern era! > > Rather than matching on the OptoAssembly, we now match before macro expansion. Ideed, matching on OptoAssembly is fragile: between the register being assigned the class to allocate and the actual call to `_new_instance_Java`, there might > be register spill, making the match hard, and fragile. Now, these regex are applied before macro expansion. > > To make that work, I needed to adapt the dump_spec of `AllocateNode` to print the allocated class, in case it is a precise constant. This is also nice to have as a human reading the output. > > The new feature is also slightly more precise in case we want to match the allocation of a given class (that is `ALLOC(_ARRAY)?_OF`). It used to be along the lines of: > > "precise .*" + IS_REPLACED + ":" > > which is actually too lenient: it only assert the suffix is what is expected. On the plus side, if we wanted to have `MyClass`, then `some/package/MyClass` would match, but on the other hand, `ItLooksLikeButIsNotMyClass` would also match. The new regex use a word boundary: > > "allocationKlass:.*\\b" + IS_REPLACED + "\\s" > > which make it a bit more specific: the given name cannot be extended with only letters: there must be a non-letter char. For instance, `ItLooksLikeButIsNotMyClass` wouldn't work anymore, since there is no word boundary between `ItLooksLikeButIsNot` and `MyClass`. It can also not be extended on the right into `MyClassButNotReally` because of the space character that must exist after the searched string. > > The case of array allocations is slightly more tricky, but essentially similar. > > It is not quite fool-proof since a package path can still be extended, e.g. > > @IR(counts = {IRNode.ALLOC_OF, "some/package/MyClass", "1"}) > > will also match allocations of `a/prefix/some/package/MyClass`. > > I think it's an acceptable limitation. > > Let's have a little preview of the new `dump_spec`. For instance, it could have been something like: > > 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) !jvms: Test$MyClass:: @ bci:23 (line 43) Test::test @ bci:5 (line 48) > > and now it is > > 315 Allocate === 290 287 273 8 1 (94 313 23 1 1 10 43 43 10 43 ) [[ 316 317 318 325 326 327 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) allocationKlass:java/util/Ha... This pull request has now been integrated. Changeset: 466f82a4 Author: Marc Chevalier Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/466f82a49996242d900a444931017261a427f9ea Stats: 155 lines in 4 files changed: 114 ins; 12 del; 29 mod 8314999: IR framework fails to detect allocation Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24093 From jbhateja at openjdk.org Fri Mar 21 08:58:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 21 Mar 2025 08:58:17 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v5] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 08:55:01 GMT, Jatin Bhateja wrote: >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 >> >> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). >> >> Summary of changes included with the patch: >> 1. C2 compiler New Vector IR creation. >> 2. Auto-vectorization support. >> 3. x86 backend implementation. >> 4. New IR verification test for each newly supported vector operation. >> >> Following are the performance numbers of Float16OperationsBenchmark >> >> System : Intel(R) Xeon(R) Processor code-named Granite rapids >> Frequency fixed at 2.5 GHz >> >> >> Baseline >> Benchmark (vectorDim) Mode Cnt Score Error Units >> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms >> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms >> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms >> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms >> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms >> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms >> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms >> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms >> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms >> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms >> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms >> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms >> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms >> Float16OperationsBenchmark.isInfiniteCMovBen... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7116: > 7114: KRegister ktmp, XMMRegister xtmp1, XMMRegister xtmp2) { > 7115: vector_max_min_fp16(opcode, dst, src1, src2, ktmp, xtmp1, xtmp2, Assembler::AVX_128bit); > 7116: } Created a separate JBS entry to track this special handling for scalar MAX/MIN https://bugs.openjdk.org/browse/JDK-8352585 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2007114926 From epeter at openjdk.org Fri Mar 21 09:02:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 09:02:11 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> Message-ID: On Tue, 18 Mar 2025 08:09:50 GMT, kuaiwei wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > @eme64 @robcasloz I think the patch for merge loads optimization is ready for PR, could you take time to review it? Thanks. @kuaiwei Just ping me when you would like me to re-review :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2742730320 From mli at openjdk.org Fri Mar 21 09:17:10 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 09:17:10 GMT Subject: RFR: 8352529: RISC-V: enable loopopts tests In-Reply-To: <8ytIDwSsn2ZF0WRCCS5DcdolHq-qn1a9do0Rib8oSjk=.ccfe5d39-0967-4056-98ba-d04c8952f394@github.com> References: <8ytIDwSsn2ZF0WRCCS5DcdolHq-qn1a9do0Rib8oSjk=.ccfe5d39-0967-4056-98ba-d04c8952f394@github.com> Message-ID: On Fri, 21 Mar 2025 00:24:33 GMT, Fei Yang wrote: > Looks fine to me. Thanks! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24138#issuecomment-2742769082 From epeter at openjdk.org Fri Mar 21 09:21:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 09:21:25 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 07:28:04 GMT, Xiaohong Gong wrote: >> Hi @eme64 , the IR test is updated according to your suggestion. Could you please look at it again? Thanks so much! > >> @XiaohongGong Could you please also merge here before I rerun the testing? > > Sure and have rebased. Thanks a lot for your testing! @XiaohongGong Tests launched! Please ping me after the weekend for the results ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2742777655 From epeter at openjdk.org Fri Mar 21 09:23:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 09:23:23 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v8] In-Reply-To: References: <6l8orDGDTI-ADWxEmDjMPX1uorIhxLd3T55s0eIzJ3I=.0cb9d2c8-4302-408f-b64e-dc9a8e3d4145@github.com> <2uVU0x7eKKHHDLTEAftG-e9qUSMG95Z968T3uyMsKDc=.f24889f5-098b-4b7b-9943-a794564b69f0@github.com> Message-ID: On Fri, 21 Mar 2025 07:16:53 GMT, Daniel Lund?n wrote: >> @dlunde What's the state with this one? Are you looking for reviews? > > @eme64: Yes, looking for reviews! But, let us check with @robcasloz before you start a review. He mentioned he was also going to review this (and has partially reviewed and contributed to the changeset already), and since it is quite a large changeset it would be good to coordinate our efforts. @dlunde Ok, then I think it makes more sense if @robcasloz reviews this first, and then me second :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2742783168 From epeter at openjdk.org Fri Mar 21 09:25:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 09:25:08 GMT Subject: RFR: 8350609: Cleanup unknown unwind opcode (0xB) for windows In-Reply-To: References: <4BUZBMlLC1zPnsieDNsbkji0xcmHLd_VHRN3bEhpJ3A=.5d2b315c-20de-4770-93e1-846cd733cde0@github.com> Message-ID: On Thu, 20 Mar 2025 20:13:08 GMT, Dhamoder Nalla wrote: >>> Did you get to test these functions for correctness, possibly using jtreg or some other tests ? >> >> Yes, we have validated the Jtreg Tier 1 tests, including vector-specific tests under /test/jdk/incubator/vector. > >> @dhanalla As @vivdesh asked above: do you have a regression test for this? >> >> You also have this warning above: >> >> Warning ?? Found leading lowercase letter in issue title for 8350609: cleanup unknown unwind opcode (0xB) for windows > I addressed the warning, and regarding regression tests, I have validated the Jtreg Tier 1 tests, including vector-specific tests under /test/jdk/incubator/vector. @dhanalla Thanks for fixing the waring and running some tests! >From my understanding, those tests passed before your patch here, correct? If so, then I'm wondering if there could be a regression test for this "unknown unwind opcode", that fails before your patch and passes with your patch? How feasible is that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23707#issuecomment-2742788767 From epeter at openjdk.org Fri Mar 21 09:26:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 09:26:20 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: References: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> Message-ID: On Thu, 20 Mar 2025 07:28:58 GMT, Xiaohong Gong wrote: >> @XiaohongGong Can you please merge with master before I launch testing? > > Hi @eme64 I'v rebased this PR. Thanks a lot for your testing! @XiaohongGong Testing launched! Please ping me after the weekend for the results ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2742790652 From duke at openjdk.org Fri Mar 21 09:26:19 2025 From: duke at openjdk.org (David Linus Briemann) Date: Fri, 21 Mar 2025 09:26:19 GMT Subject: Integrated: 8352512: TestVectorZeroCount: counter not reset between iterations In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 14:31:47 GMT, David Linus Briemann wrote: > 8352512: TestVectorZeroCount: counter not reset between iterations This pull request has now been integrated. Changeset: 1c0fa0af Author: David Linus Briemann Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/1c0fa0af7847d80fd3fbe38f28207aab270609b3 Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod 8352512: TestVectorZeroCount: counter not reset between iterations Reviewed-by: mdoerr, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24134 From epeter at openjdk.org Fri Mar 21 09:34:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 09:34:27 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 22:55:18 GMT, Srinivas Vamsi Parasa wrote: >>> LGTM, >>> >>> Please file a JBS on future modification in assembler layer for EEVEX to REX/REX2 encoding and append to this PR before committing. >>> >>> Thanks. >> >> Thanks for the review Jatin! The JBS for EEVEX to REX/REX2 demotion has been created: https://bugs.openjdk.org/browse/JDK-8351994 >> >> Thanks, >> Vamsi > >> > @vamsi-parasa I tried to launch testing, but my script fails because of some merge issue. Would you mind merging from master? >> >> Hi Emanuel (@eme64), please see the updated code after the merge with master. > > Hi Emanuel (@eme64), could you please let me know if you're still seeing script failure? @vamsi-parasa I launched testing now. Please ping me after the weekend for results :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2742804110 From epeter at openjdk.org Fri Mar 21 09:34:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 09:34:27 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v12] In-Reply-To: References: <7vVEaiKEa__OeHQl8RjoNE_egAjB4zHpdQ3OKYLgbaI=.8fb97820-8e96-4761-8887-05ea5f4c2373@github.com> Message-ID: On Thu, 13 Mar 2025 17:53:35 GMT, Srinivas Vamsi Parasa wrote: > > I had a quick look over this. It's a bit hard to review for me, because it is basically about specific APX instructions. We probably have to heavily rely on testing. But APX hardware is not yet available, right? > > How can be best test this? Is there any way to emulate, maybe using SDE? What testing did you run for this? > > The code was tested using the SDE emulator. Apart from small java-based unit tests to check if the correct instruction is being emitted, the APX enabling was also tested on SPECjvm2008 workloads as well. Ok, I suppose that does not give us great coverage, but it is hard to do much better until we have the actual hardware... thanks ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2742811294 From epeter at openjdk.org Fri Mar 21 09:45:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 09:45:10 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> Message-ID: <7Fzrb4C-4VyJlOMUaaFqhTzlj4o7dVXS8-EkLCiVVA4=.367fe44f-540c-49cc-aa1c-3ab381febd32@github.com> On Fri, 14 Mar 2025 11:51:51 GMT, Quan Anh Mai wrote: >>> I would expect the `memory_type` of a `StoreC` into a `long[]` to be something that means "a part of a `long[]`" >> >> If that was the intended meaning of `MemNode::memory_type()`, wouldn't the function be redundant, because we can retrieve that information from `MemNode::adr_type()` already? > > @robcasloz Yes that's right. Then `MemNode::memory_type()` does not refer to the thing in memory at all, but the thing that is about to interact with the memory. I think: > > - We should rename it to `MemNode::value_type()` or `MemNode::value_basic_type()` > - It is simply incorrect to use it to reason about the thing in the memory in this problem, and using `adr_type` is the correct fix. > > To be clear, I don't think having `StoreSNode` would solve any issue. I can `StoreS` into a `char[]`, and `StoreC` into a `short[]` and we are back at the same issue. @merykitty You are right, `MemNode::memory_type` is very easy to misunderstand. We could probably rename it, and while doing that check all usages. We have had bugs like this before, I think I had one in SuperWord as well some years ago... What would be a better name though? Quickly looking at the cases, there are not even that many usages: emanuel at emanuel-oracle:/oracle-work/jdk-fork0/open$ grep memory_type src/hotspot/share/opto/ -r src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const = 0; src/hotspot/share/opto/memnode.hpp: return type2aelembytes(memory_type(), true); src/hotspot/share/opto/memnode.hpp: return type2aelembytes(memory_type()); src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_BYTE; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_BYTE; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_CHAR; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_SHORT; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_INT; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_LONG; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_FLOAT; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_DOUBLE; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_ADDRESS; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_NARROWOOP; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_NARROWKLASS; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_BYTE; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_CHAR; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_INT; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_LONG; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_FLOAT; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_DOUBLE; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_ADDRESS; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_NARROWOOP; } src/hotspot/share/opto/memnode.hpp: virtual BasicType memory_type() const { return T_NARROWKLASS; } src/hotspot/share/opto/escape.cpp: // StoreP::memory_type() == T_ADDRESS src/hotspot/share/opto/escape.cpp: store->as_Store()->memory_type() == ft) { src/hotspot/share/opto/vectornode.hpp: virtual BasicType memory_type() const { return T_VOID; } src/hotspot/share/opto/vectornode.hpp: virtual BasicType memory_type() const { return T_VOID; } src/hotspot/share/opto/memnode.cpp: if (memory_type() != T_VOID) { src/hotspot/share/opto/memnode.cpp: return phase->zerocon(memory_type()); src/hotspot/share/opto/memnode.cpp: memory_type(), is_unsigned()); src/hotspot/share/opto/memnode.cpp: const Type* con_type = Type::make_constant_from_field(const_oop->as_instance(), off, is_unsigned(), memory_type()); src/hotspot/share/opto/superword.cpp: bt = n->as_Mem()->memory_type(); src/hotspot/share/opto/superword.cpp: bt = n->as_Mem()->memory_type(); src/hotspot/share/opto/superword.cpp: is_java_primitive(mem->memory_type())) { src/hotspot/share/opto/superword.cpp: if (!is_java_primitive(s1->as_Mem()->memory_type()) || src/hotspot/share/opto/superword.cpp: !is_java_primitive(s2->as_Mem()->memory_type())) { src/hotspot/share/opto/superword.cpp: BasicType bt = n->as_Mem()->memory_type(); src/hotspot/share/opto/loopTransform.cpp: BasicType t = store->as_Mem()->memory_type(); src/hotspot/share/opto/loopTransform.cpp: if (type2aelembytes(store->as_Mem()->memory_type(), true) != (1 << n->in(2)->get_int())) { src/hotspot/share/opto/loopTransform.cpp: BasicType t = store->as_Mem()->memory_type(); Well, I looked through them, and I cannot see any issue with the other cases. But maybe someone else can give the usages a quick look too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2742835812 From epeter at openjdk.org Fri Mar 21 09:46:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 09:46:10 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v3] In-Reply-To: References: Message-ID: On Mon, 17 Feb 2025 15:00:51 GMT, Jasmine Karthikeyan wrote: >> @jaskarth just ping me whenever I should have a look again! > > @eme64 I think it should be good for another look over! I've addressed your review comments in the last commit. > > About the potential for performance degradation, I think it would be unlikely since the code generated by the cast is quite small (as it only needs to truncate or sign-extend) and the patch increases the amount of possible code that can auto-vectorize. The one case that I can think of is that it might cause code that would be otherwise unprofitable to become vectorizable, but that would be because we don't have a cost model yet. @jaskarth Let me know if there is anything we can help you with here :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2742838142 From epeter at openjdk.org Fri Mar 21 09:49:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 09:49:27 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v16] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 15:51:53 GMT, Marc Chevalier wrote: >> This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. >> >> Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. >> >> This also works for multiplications by powers of 2 since they are already translated into shifts. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Rephrase comment @marc-chevalier Nice work. Thanks for all the updates, and extra time spent on the proof! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23728#pullrequestreview-2705325337 From duke at openjdk.org Fri Mar 21 09:56:44 2025 From: duke at openjdk.org (David Linus Briemann) Date: Fri, 21 Mar 2025 09:56:44 GMT Subject: RFR: 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes Message-ID: VectorCastL2X was not added due to bad performance and thus the bit count instructions are only vectorized for `int` but not for `long`. ------------- Commit messages: - disable IR match rules for long bit counts on ppc - re-add PopCountVL - remove vectorization of long bit counts - 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes Changes: https://git.openjdk.org/jdk/pull/24064/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24064&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352065 Stats: 96 lines in 5 files changed: 90 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24064/head:pull/24064 PR: https://git.openjdk.org/jdk/pull/24064 From duke at openjdk.org Fri Mar 21 10:09:12 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 21 Mar 2025 10:09:12 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v6] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request incrementally with two additional commits since the last revision: - Enable StressIGVN and riscv platform - Change tests as review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/1eba9308..ed5590a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=04-05 Stats: 728 lines in 3 files changed: 434 ins; 118 del; 176 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From duke at openjdk.org Fri Mar 21 10:09:15 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 21 Mar 2025 10:09:15 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: <8Nzu73Xjlgp4GPnNfWI9_s4xUWuQdk7fIHtKnVcDtb4=.63320d26-211f-45b0-84e6-fd93eb5f9475@github.com> On Tue, 18 Mar 2025 08:24:50 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert extract value and add more tests > > test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 235: > >> 233: } >> 234: } >> 235: } > > In the meantime, I've developed a `Verify.java`, exactly for this. Would you mind using it, it would reduce the amount of code here quite a bit ;) Use Verify.java to check. It's very useful. > test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 423: > >> 421: @IR(counts = {IRNode.LOAD_I_OF_CLASS, "byte\\\\[int:>=0] \\\\(java/lang/Cloneable,java/io/Serializable\\\\)", "1"}, >> 422: applyIf = {"UseUnalignedAccesses", "true"}, >> 423: applyIfPlatform = {"big-endian", "true"}) > > Can you please also check for byte and char loads? It would make sure that we do not have any loads that we are not expecting, and that the graph was cleaned appropriately. The other Load type are added in ir rules. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2007252710 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2007255414 From duke at openjdk.org Fri Mar 21 10:09:14 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 21 Mar 2025 10:09:14 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:46:37 GMT, kuaiwei wrote: >> test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 44: >> >>> 42: * @run main compiler.c2.TestMergeLoads aligned >>> 43: * >>> 44: * @requires os.arch != "riscv64" | vm.cpu.features ~= ".*zbb.*" >> >> Can you remove this global requirement, so that those platforms can at least do result verification? >> You can always add restrictions to the `@IR` rules. > > Ok, I will add them to @IR rules. Fixed >> test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 69: >> >>> 67: switch (args[0]) { >>> 68: case "aligned" -> { framework.addFlags("-XX:-UseUnalignedAccesses"); } >>> 69: case "unaligned" -> { framework.addFlags("-XX:+UseUnalignedAccesses"); } >> >> Can you please also add an explicit run with `StressIGVN`? Because the flag is not whitelisted for the TestFramework, and so if it was set from the outside, the IR rules would not be executed. But it would be nice that your algorithm is stable to reorderings in IGVN ;) > > I think my optimization is not dependent on order of IGVN. I will verify it with this option. Thanks. `StreassIGVN` tests added ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2007250651 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2007251654 From epeter at openjdk.org Fri Mar 21 10:10:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 10:10:17 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value [v2] In-Reply-To: References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: On Tue, 18 Mar 2025 03:51:55 GMT, Xiaohong Gong wrote: >> The following assertion fails on AArch64: >> >> >> Internal Error (jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2991), pid=3822987, tid=3823007 >> assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value >> >> >> with a simple Vector API case: >> >> public static IntVector test() { >> IntVector iv = IntVector.zero(IntVector.SPECIES_128); >> return iv.lanewise(VectorOperators.ROR, iv); >> } >> >> >> On AArch64, vector `ROR/ROL` (rotate right/left) operations are implemented with a combination of shifts. Please see the pattern for `ROR`: >> >> >> lsr dst1, src, cnt // unsigned right shift >> lsl dst2, src, bitSize - cnt // left shift >> orr dst, dst1, dst2 // logical or >> >> where `bitSize` is the element type width (e.g. `32` for `int`). In above case, `cnt` is a zero constant, resulting in a left shift of 32 (`bitSize - 0`), which exceeds the instruction's valid shift count range and triggers the assertion. To fix this, we need to mask the shift count to ensure it stays within valid range when calculating shift counts for rotate operations: `shiftCnt = shiftCnt & (bitSize - 1)`. >> >> Note that the mask is only necessary for constant shift counts. This not only fixes the assertion failure, but also allows `ROR/ROL src, 0` to be optimized to `src` directly. >> >> For vector variables as shift counts, the masking can be safely omitted because: >> 1. Vector shift instructions that take a vector register as the shift count may not automatically apply modulo arithmetic based on register size. When the shift count is `32` for int type, the result may be either `zeros` or `src`. However, this doesn't affect correctness for rotate since the final result is combined with `src` using a logical `OR` operation. >> 2. It saves a vector logical `AND` for masking, which is friendly to the performance. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update the test case Hmm so your patch adds in an extra node. It probably does not cost much, but I'd like to be sure that it's needed. Is there any case where we now have wrong results on master? Because I could not find one, only the assert on `aarch64`. But given that you are adding the `AndI` node, it seems there should be wrong results, right? Can you find a test for that? Actually, I have a question. Below, there is this section: if (!is_binary_vector_op) { shiftLCnt = phase->transform(new LShiftCntVNode(shiftLCnt, vt)); shiftRCnt = phase->transform(new RShiftCntVNode(shiftRCnt, vt)); } Can you tell me what this is for? Maybe it is something else. I see that I'm doing the same `AndI` trick in SuperWord, so maybe it is needed in general: VTransformApplyResult VTransformShiftCountNode::apply(const VLoopAnalyzer& vloop_analyzer, const GrowableArray& vnode_idx_to_transformed_node) const { PhaseIdealLoop* phase = vloop_analyzer.vloop().phase(); Node* shift_count_in = find_transformed_input(1, vnode_idx_to_transformed_node); assert(shift_count_in->bottom_type()->isa_int(), "int type only for shift count"); // The shift_count_in would be automatically truncated to the lowest _mask // bits in a scalar shift operation. But vector shift does not truncate, so // we must apply the mask now. Node* shift_count_masked = new AndINode(shift_count_in, phase->igvn().intcon(_mask)); register_new_node_from_vectorization(vloop_analyzer, shift_count_masked, shift_count_in); // Now that masked value is "boadcast" (some platforms only set the lowest element). VectorNode* vn = VectorNode::shift_count(_shift_opcode, shift_count_masked, _vlen, _element_bt); register_new_node_from_vectorization(vloop_analyzer, vn, shift_count_in); return VTransformApplyResult::make_vector(vn, _vlen, vn->length_in_bytes()); } Generally, I think we need better annotations in `vectornode.hpp`. So for example it would be nice if you could document the assumptions about the input above `ShiftVNode`. Do we expect the shift value to be in a specific range? Or is it wrapped like the scalar shift operator `<<` and `>>`? ------------- PR Review: https://git.openjdk.org/jdk/pull/24051#pullrequestreview-2705393105 From mdoerr at openjdk.org Fri Mar 21 10:13:12 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 21 Mar 2025 10:13:12 GMT Subject: RFR: 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 16:09:34 GMT, David Linus Briemann wrote: > VectorCastL2X was not added due to bad performance and thus the bit count instructions are only vectorized for `int` but not for `long`. I've reviewed the new instructions and the usage in the .ad file. Looks all correct. Changing `TestNumberOfContinuousZeros.java` to test vectorization with `int` types on Power9 and later makes sense. I only wonder about `TestPopCountVectorLong.java`. We're enabling the test which is supposed to test vectorization for `long`, but we run it without vectorization on PPC64? Wouldn't it be better to keep the test disabled for PPC64 (revert changes in this file)? ------------- PR Review: https://git.openjdk.org/jdk/pull/24064#pullrequestreview-2705400116 From duke at openjdk.org Fri Mar 21 10:25:32 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 21 Mar 2025 10:25:32 GMT Subject: RFR: 8351515: C2 incorrectly removes double negation for double and float Message-ID: # Issue Summary A fuzzer run discovered that code of the form `0 - (0 - x)` yields the wrong result for `x = -0.0`. According to the [Java Language Spec 15.8.2](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.18.2) the sum of floating point zeroes of opposite sign is positive zero. Hence, `(0 - (0 - (-0.0))` must result in `+0.0`, but due to the folding of all double negations in `SubNode::Identity` this was optimized to `-0.0` # Changeset overview To fix this issue, I excluded floating point numbers from the folding of double negations, which also includes `Float16` values. This might seem excessive at first glance, but we do not track range information for floating point types. Hence, we could still perform the folding of the double negation if a floating point constant is not `-0.0`. However, constant folding already takes care of this. Changes: - IR-Framework: fix `IRNode.SUB`not matching `SubHFNode` - Add a regression IR-test - Exclude floating point `SubNodes` from folding double negations # Testing - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/13989207348) - `tier1` through `tier5` plus Oracle internal testing ------------- Commit messages: - subnode: do not remove double negation of floating point numbers - Add regression test for JDK-8351515 - Add SUB_HF nodes to generic sub node matching Changes: https://git.openjdk.org/jdk/pull/24150/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24150&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351515 Stats: 88 lines in 3 files changed: 83 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24150.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24150/head:pull/24150 PR: https://git.openjdk.org/jdk/pull/24150 From chagedorn at openjdk.org Fri Mar 21 10:41:42 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 21 Mar 2025 10:41:42 GMT Subject: RFR: 8351515: C2 incorrectly removes double negation for double and float In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 10:20:53 GMT, Manuel H?ssig wrote: > # Issue Summary > > A fuzzer run discovered that code of the form `0 - (0 - x)` yields the wrong result for `x = -0.0`. According to the [Java Language Spec 15.8.2](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.18.2) the sum of floating point zeroes of opposite sign is positive zero. Hence, `(0 - (0 - (-0.0))` must result in `+0.0`, but due to the folding of all double negations in `SubNode::Identity` this was optimized to `-0.0` > > # Changeset overview > > To fix this issue, I excluded floating point numbers from the folding of double negations, which also includes `Float16` values. This might seem excessive at first glance, but we do not track range information for floating point types. Hence, we could still perform the folding of the double negation if a floating point constant is not `-0.0`. However, constant folding already takes care of this. > > Changes: > - IR-Framework: fix `IRNode.SUB`not matching `SubHFNode` > - Add a regression IR-test > - Exclude floating point `SubNodes` from folding double negations > > # Testing > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/13989207348) > - `tier1` through `tier5` plus Oracle internal testing Looks good to me. Nice to also see a `Float16` IR test :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24150#pullrequestreview-2705470867 From duke at openjdk.org Fri Mar 21 11:04:32 2025 From: duke at openjdk.org (David Linus Briemann) Date: Fri, 21 Mar 2025 11:04:32 GMT Subject: RFR: 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes [v2] In-Reply-To: References: Message-ID: <1ZrSc1xlXIf0OoyrUSCaTJUswul25Z8Ha0_VT1W9Ejg=.efa1f984-a441-488f-829c-8b8b94b57af0@github.com> > VectorCastL2X was not added due to bad performance and thus the bit count instructions are only vectorized for `int` but not for `long`. David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: disable TestVectorPopcountVectorLong on power again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24064/files - new: https://git.openjdk.org/jdk/pull/24064/files/a3f027f5..afca9b3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24064&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24064&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24064/head:pull/24064 PR: https://git.openjdk.org/jdk/pull/24064 From duke at openjdk.org Fri Mar 21 11:04:33 2025 From: duke at openjdk.org (David Linus Briemann) Date: Fri, 21 Mar 2025 11:04:33 GMT Subject: RFR: 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes [v2] In-Reply-To: References: Message-ID: <7sbaC5lgrcUHDkjLEiI1BuuUFq6XbPhg5QUwqxhCUQM=.b86a1853-df1a-4e5d-9f6a-b60ded0216cc@github.com> On Fri, 21 Mar 2025 10:10:31 GMT, Martin Doerr wrote: > I've reviewed the new instructions and the usage in the .ad file. Looks all correct. Changing `TestNumberOfContinuousZeros.java` to test vectorization with `int` types on Power9 and later makes sense. I only wonder about `TestPopCountVectorLong.java`. We're enabling the test which is supposed to test vectorization for `long`, but we run it without vectorization on PPC64? Wouldn't it be better to keep the test disabled for PPC64 (revert changes in this file)? That makes sense. I reverted the file and the test will not be run on power again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24064#issuecomment-2743028165 From hgreule at openjdk.org Fri Mar 21 11:07:15 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 21 Mar 2025 11:07:15 GMT Subject: RFR: 8351515: C2 incorrectly removes double negation for double and float In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 10:20:53 GMT, Manuel H?ssig wrote: > # Issue Summary > > A fuzzer run discovered that code of the form `0 - (0 - x)` yields the wrong result for `x = -0.0`. According to the [Java Language Spec 15.8.2](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.18.2) the sum of floating point zeroes of opposite sign is positive zero. Hence, `(0 - (0 - (-0.0))` must result in `+0.0`, but due to the folding of all double negations in `SubNode::Identity` this was optimized to `-0.0` > > # Changeset overview > > To fix this issue, I excluded floating point numbers from the folding of double negations, which also includes `Float16` values. This might seem excessive at first glance, but we do not track range information for floating point types. Hence, we could still perform the folding of the double negation if a floating point constant is not `-0.0`. However, constant folding already takes care of this. > > Changes: > - IR-Framework: fix `IRNode.SUB`not matching `SubHFNode` > - Add a regression IR-test > - Exclude floating point `SubNodes` from folding double negations > > # Testing > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/13989207348) > - `tier1` through `tier5` plus Oracle internal testing Negation here can be a bit of a confusing term since the [JLS ? 15.15.4](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.15.4) says: > For floating-point values, negation is *not* the same as subtraction from zero, because if x is +0.0, then 0.0-x is +0.0, but -x is -0.0 This is also the reason why NegF/D nodes are used separately (NegI/L nodes exist but seem to be unused). The change itself looks good, but maybe it's worth to clarify the comment? Or somehow reference the comment here: https://github.com/openjdk/jdk/blob/b32be18bf940eb6eb9805390fd72e0de175c912a/src/hotspot/share/opto/subnode.hpp#L473-L478 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24150#issuecomment-2743034825 From fjiang at openjdk.org Fri Mar 21 11:09:13 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 21 Mar 2025 11:09:13 GMT Subject: RFR: 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled In-Reply-To: References: Message-ID: <8tkzsnk4PSWgzw6JdpqvjjODlWokkQqqS5vl-NXnkc4=.894363fd-7669-4c70-9b91-662898e6bee0@github.com> On Thu, 20 Mar 2025 02:32:18 GMT, Fei Yang wrote: > Hi, please consider this small change. > > `UsePoly1305Intrinsics`, `UseMD5Intrinsics` and `UseSHA1Intrinsics` depend on `!AvoidUnalignedAccesses` and thus are unavailable on platforms with slow unaligned accesses. But these options could still be enabled on the command line, which I think could be suprising to our end users as these intrinsics will only have negative impact on performance numbers for such platforms. It seems to me more reasonable to print warnings and keep them disabled when enabled by the user on such platforms. After this change, we have: > > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseMD5Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for MD5 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UsePoly1305Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for Poly1305 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseSHA1Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for SHA-1 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) Looks good. ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/24123#pullrequestreview-2705546961 From fyang at openjdk.org Fri Mar 21 11:14:23 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Mar 2025 11:14:23 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 10:04:00 GMT, Emanuel Peter wrote: >> With the Template Framework, I'm generating IR tests randomly. But random code can always hit bailouts in compilation, and make code not compilable any more. We should have a way to disable this check, and just gracefully continue to execute the tests. >> >> To allow a single test method to be `not compilable`: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L160-L161 >> >> To allow all test methods to be `not compilable`: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java#L140-L144 >> >> See also this documentation in the code: >> https://github.com/openjdk/jdk/blob/ce40f1402387f75ea8627883979e3cbf63480941/test/hotspot/jtreg/compiler/lib/ir_framework/Test.java#L88-L94 >> >> --------------------------------------- >> >> **Background** >> >> My random code seems to hit a bailout in the Register Allocator, and I cannot do much to predict if that bailout happens. >> See https://bugs.openjdk.org/browse/JDK-8304328 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java line 33: > 31: /* > 32: * @test > 33: * @requires vm.compiler2.enabled & vm.flagless Hi, I wonder if we should add a `vm.debug == true` to the @requires list? I witnessed test failure on my linux-aarch64 platform with release build when running this test: `make test TEST="testlibrary_tests/ir_framework/tests/TestNotCompilable.java"` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2007356463 From luhenry at openjdk.org Fri Mar 21 11:20:10 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Mar 2025 11:20:10 GMT Subject: RFR: 8352248: Check if CMoveX is supported [v2] In-Reply-To: References: Message-ID: <9GTiSeK8Ni4NYWMKwXrwcGTAWWapxL8DNX0J95fEtdU=.920aa89d-f8ee-4668-8d2b-ccc6b26746d3@github.com> On Wed, 19 Mar 2025 09:43:24 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Currenlty, seems CMoveX are fully supported on most platforms, except of riscv64. >> On riscv64, there is no efficient way to implement CMoveF/D as other CMoveX (e.g. CMoveI), but it will still bring benefit by just supporting CMoveX without CMoveF/D. This patch is to supply such option. >> >> As other platforms already supported CMoveX, this patch should not impact them, as `!CMoveNode::supported(_igvn.type(phi))` should always be false. >> >> BTW, in a subsequent pr for riscv, I'll implement CMoveX except of CMoveF/D, and also return false for CMoveF/D in Matcher::match_rule_supported. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24095#pullrequestreview-2705571912 From epeter at openjdk.org Fri Mar 21 11:20:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 11:20:22 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v4] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 11:10:31 GMT, Fei Yang wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java line 33: > >> 31: /* >> 32: * @test >> 33: * @requires vm.compiler2.enabled & vm.flagless > > Hi, I wonder if we should add a `vm.debug == true` to the @requires list? > I witnessed test failure on my linux-aarch64 platform with release build when running this test: > `make test TEST="testlibrary_tests/ir_framework/tests/TestNotCompilable.java"` @RealFYang Thanks for the report! I ran full testing, but somehow this slipped though! java.lang.RuntimeException: should have thrown TestRunException/TestVMException or IRViolationException at ir_framework.tests.TestNotCompilable.runWithExcludeExpectFailure(TestNotCompilable.java:100) at ir_framework.tests.TestNotCompilable.runTests(TestNotCompilable.java:58) at ir_framework.tests.TestNotCompilable.main(TestNotCompilable.java:41) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) at java.base/java.lang.Thread.run(Thread.java:1447) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2007365139 From epeter at openjdk.org Fri Mar 21 11:20:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 11:20:22 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v4] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 11:17:22 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestNotCompilable.java line 33: >> >>> 31: /* >>> 32: * @test >>> 33: * @requires vm.compiler2.enabled & vm.flagless >> >> Hi, I wonder if we should add a `vm.debug == true` to the @requires list? >> I witnessed test failure on my linux-aarch64 platform with release build when running this test: >> `make test TEST="testlibrary_tests/ir_framework/tests/TestNotCompilable.java"` > > @RealFYang Thanks for the report! I ran full testing, but somehow this slipped though! > > > java.lang.RuntimeException: should have thrown TestRunException/TestVMException or IRViolationException > at ir_framework.tests.TestNotCompilable.runWithExcludeExpectFailure(TestNotCompilable.java:100) > at ir_framework.tests.TestNotCompilable.runTests(TestNotCompilable.java:58) > at ir_framework.tests.TestNotCompilable.main(TestNotCompilable.java:41) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) > at java.base/java.lang.Thread.run(Thread.java:1447) I'll file a bug for it! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2007365937 From luhenry at openjdk.org Fri Mar 21 11:21:09 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Mar 2025 11:21:09 GMT Subject: RFR: 8352529: RISC-V: enable loopopts tests In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 17:28:08 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? This is also a follow-up of https://github.com/openjdk/jdk/pull/23985. > There're bunch of test under `test/hotspot/jtreg/compiler/loopopts/` could be enabled for riscv. > > There are some failures for some tests after enabling them, I'll investigate further and send out separate pr after fixing them. > > Thanks! Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24138#pullrequestreview-2705574214 From luhenry at openjdk.org Fri Mar 21 11:24:13 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Mar 2025 11:24:13 GMT Subject: RFR: 8320997: RISC-V: C2 ReverseV In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 10:42:21 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this patch to implement ReverseV? > > Thanks! Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24096#pullrequestreview-2705582498 From epeter at openjdk.org Fri Mar 21 11:25:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 11:25:15 GMT Subject: RFR: 8351952: [IR Framework]: allow ignoring methods that are not compilable [v4] In-Reply-To: References: Message-ID: <12eUOsjGm8dysZ6itRHL1G8nQgGwOPWGn4rkfgOyARY=.80376a51-bdad-465f-8c7f-029377211497@github.com> On Fri, 21 Mar 2025 11:17:58 GMT, Emanuel Peter wrote: >> @RealFYang Thanks for the report! I ran full testing, but somehow this slipped though! >> >> >> java.lang.RuntimeException: should have thrown TestRunException/TestVMException or IRViolationException >> at ir_framework.tests.TestNotCompilable.runWithExcludeExpectFailure(TestNotCompilable.java:100) >> at ir_framework.tests.TestNotCompilable.runTests(TestNotCompilable.java:58) >> at ir_framework.tests.TestNotCompilable.main(TestNotCompilable.java:41) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > I'll file a bug for it! Filed: [JDK-8352597](https://bugs.openjdk.org/browse/JDK-8352597) [IR Framework] test bug: TestNotCompilable.java fails on product build ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24049#discussion_r2007373250 From mdoerr at openjdk.org Fri Mar 21 11:40:22 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 21 Mar 2025 11:40:22 GMT Subject: RFR: 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes [v2] In-Reply-To: <1ZrSc1xlXIf0OoyrUSCaTJUswul25Z8Ha0_VT1W9Ejg=.efa1f984-a441-488f-829c-8b8b94b57af0@github.com> References: <1ZrSc1xlXIf0OoyrUSCaTJUswul25Z8Ha0_VT1W9Ejg=.efa1f984-a441-488f-829c-8b8b94b57af0@github.com> Message-ID: On Fri, 21 Mar 2025 11:04:32 GMT, David Linus Briemann wrote: >> VectorCastL2X was not added due to bad performance and thus the bit count instructions are only vectorized for `int` but not for `long`. > > David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: > > disable TestVectorPopcountVectorLong on power again LTGM. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24064#pullrequestreview-2705628283 From rehn at openjdk.org Fri Mar 21 11:46:16 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 21 Mar 2025 11:46:16 GMT Subject: RFR: 8352248: Check if CMoveX is supported [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 09:43:24 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> Currenlty, seems CMoveX are fully supported on most platforms, except of riscv64. >> On riscv64, there is no efficient way to implement CMoveF/D as other CMoveX (e.g. CMoveI), but it will still bring benefit by just supporting CMoveX without CMoveF/D. This patch is to supply such option. >> >> As other platforms already supported CMoveX, this patch should not impact them, as `!CMoveNode::supported(_igvn.type(phi))` should always be false. >> >> BTW, in a subsequent pr for riscv, I'll implement CMoveX except of CMoveF/D, and also return false for CMoveF/D in Matcher::match_rule_supported. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Thanks for explaining this @Hamlin-Li! Yes, this reasonable approach, thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24095#pullrequestreview-2705646201 From rehn at openjdk.org Fri Mar 21 11:58:09 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 21 Mar 2025 11:58:09 GMT Subject: RFR: 8352529: RISC-V: enable loopopts tests In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 17:28:08 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? This is also a follow-up of https://github.com/openjdk/jdk/pull/23985. > There're bunch of test under `test/hotspot/jtreg/compiler/loopopts/` could be enabled for riscv. > > There are some failures for some tests after enabling them, I'll investigate further and send out separate pr after fixing them. > > Thanks! Thank, you! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24138#pullrequestreview-2705694785 From duke at openjdk.org Fri Mar 21 12:09:40 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 21 Mar 2025 12:09:40 GMT Subject: RFR: 8351515: C2 incorrectly removes double negation for double and float [v2] In-Reply-To: References: Message-ID: > # Issue Summary > > A fuzzer run discovered that code of the form `0 - (0 - x)` yields the wrong result for `x = -0.0`. According to the [Java Language Spec 15.8.2](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.18.2) the sum of floating point zeroes of opposite sign is positive zero. Hence, `(0 - (0 - (-0.0))` must result in `+0.0`, but due to the folding of all double negations in `SubNode::Identity` this was optimized to `-0.0` > > # Changeset overview > > To fix this issue, I excluded floating point numbers from the folding of double negations, which also includes `Float16` values. This might seem excessive at first glance, but we do not track range information for floating point types. Hence, we could still perform the folding of the double negation if a floating point constant is not `-0.0`. However, constant folding already takes care of this. > > Changes: > - IR-Framework: fix `IRNode.SUB`not matching `SubHFNode` > - Add a regression IR-test > - Exclude floating point `SubNodes` from folding double negations > > # Testing > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/13989207348) > - `tier1` through `tier5` plus Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Improve comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24150/files - new: https://git.openjdk.org/jdk/pull/24150/files/7805314f..dfcc756f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24150&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24150&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24150.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24150/head:pull/24150 PR: https://git.openjdk.org/jdk/pull/24150 From duke at openjdk.org Fri Mar 21 12:09:40 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 21 Mar 2025 12:09:40 GMT Subject: RFR: 8351515: C2 incorrectly removes double negation for double and float In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 11:04:16 GMT, Hannes Greule wrote: >> # Issue Summary >> >> A fuzzer run discovered that code of the form `0 - (0 - x)` yields the wrong result for `x = -0.0`. According to the [Java Language Spec 15.8.2](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.18.2) the sum of floating point zeroes of opposite sign is positive zero. Hence, `(0 - (0 - (-0.0))` must result in `+0.0`, but due to the folding of all double negations in `SubNode::Identity` this was optimized to `-0.0` >> >> # Changeset overview >> >> To fix this issue, I excluded floating point numbers from the folding of double negations, which also includes `Float16` values. This might seem excessive at first glance, but we do not track range information for floating point types. Hence, we could still perform the folding of the double negation if a floating point constant is not `-0.0`. However, constant folding already takes care of this. >> >> Changes: >> - IR-Framework: fix `IRNode.SUB`not matching `SubHFNode` >> - Add a regression IR-test >> - Exclude floating point `SubNodes` from folding double negations >> >> # Testing >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/13989207348) >> - `tier1` through `tier5` plus Oracle internal testing > > Negation here can be a bit of a confusing term since the [JLS ? 15.15.4](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.15.4) says: >> For floating-point values, negation is *not* the same as subtraction from zero, because if x is +0.0, then 0.0-x is +0.0, but -x is -0.0 > > This is also the reason why NegF/D nodes are used separately (NegI/L nodes exist but seem to be unused). The change itself looks good, but maybe it's worth to clarify the comment? Or somehow reference the comment here: https://github.com/openjdk/jdk/blob/b32be18bf940eb6eb9805390fd72e0de175c912a/src/hotspot/share/opto/subnode.hpp#L473-L478 @SirYwell thanks for the pointer to the negation section of the spec. I improved the comment by referencing spec and differentiating subtraction from negation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24150#issuecomment-2743181121 From mli at openjdk.org Fri Mar 21 12:12:15 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 12:12:15 GMT Subject: RFR: 8352529: RISC-V: enable loopopts tests In-Reply-To: References: Message-ID: <_KjH7TNZPmP5QjCN1dlqAoPXQvraCni5Xlmv6lNSbA4=.56aafe7f-a609-4b95-a8b6-d1b79aa7608a@github.com> On Fri, 21 Mar 2025 11:18:43 GMT, Ludovic Henry wrote: >> Hi, >> Can you help to review this patch? This is also a follow-up of https://github.com/openjdk/jdk/pull/23985. >> There're bunch of test under `test/hotspot/jtreg/compiler/loopopts/` could be enabled for riscv. >> >> There are some failures for some tests after enabling them, I'll investigate further and send out separate pr after fixing them. >> >> Thanks! > > Marked as reviewed by luhenry (Committer). Thank you @luhenry @robehn ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24138#issuecomment-2743181518 From mli at openjdk.org Fri Mar 21 12:12:17 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 12:12:17 GMT Subject: RFR: 8352248: Check if CMoveX is supported [v2] In-Reply-To: <9GTiSeK8Ni4NYWMKwXrwcGTAWWapxL8DNX0J95fEtdU=.920aa89d-f8ee-4668-8d2b-ccc6b26746d3@github.com> References: <9GTiSeK8Ni4NYWMKwXrwcGTAWWapxL8DNX0J95fEtdU=.920aa89d-f8ee-4668-8d2b-ccc6b26746d3@github.com> Message-ID: On Fri, 21 Mar 2025 11:17:35 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> minor > > Marked as reviewed by luhenry (Committer). Thank you @luhenry @robehn ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24095#issuecomment-2743184022 From mli at openjdk.org Fri Mar 21 12:12:16 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 12:12:16 GMT Subject: Integrated: 8352529: RISC-V: enable loopopts tests In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 17:28:08 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? This is also a follow-up of https://github.com/openjdk/jdk/pull/23985. > There're bunch of test under `test/hotspot/jtreg/compiler/loopopts/` could be enabled for riscv. > > There are some failures for some tests after enabling them, I'll investigate further and send out separate pr after fixing them. > > Thanks! This pull request has now been integrated. Changeset: 2b559795 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/2b559795958a18d8a14d2e30d039488ad6f6ee5a Stats: 190 lines in 10 files changed: 0 ins; 0 del; 190 mod 8352529: RISC-V: enable loopopts tests Reviewed-by: fyang, luhenry, rehn ------------- PR: https://git.openjdk.org/jdk/pull/24138 From mli at openjdk.org Fri Mar 21 12:12:18 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 12:12:18 GMT Subject: Integrated: 8352248: Check if CMoveX is supported In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 10:02:27 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > Currenlty, seems CMoveX are fully supported on most platforms, except of riscv64. > On riscv64, there is no efficient way to implement CMoveF/D as other CMoveX (e.g. CMoveI), but it will still bring benefit by just supporting CMoveX without CMoveF/D. This patch is to supply such option. > > As other platforms already supported CMoveX, this patch should not impact them, as `!CMoveNode::supported(_igvn.type(phi))` should always be false. > > BTW, in a subsequent pr for riscv, I'll implement CMoveX except of CMoveF/D, and also return false for CMoveF/D in Matcher::match_rule_supported. > > Thanks! This pull request has now been integrated. Changeset: d1cf2328 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/d1cf232893615f1907bb84728cbc1f566a369757 Stats: 20 lines in 3 files changed: 16 ins; 0 del; 4 mod 8352248: Check if CMoveX is supported Reviewed-by: chagedorn, luhenry, rehn ------------- PR: https://git.openjdk.org/jdk/pull/24095 From mli at openjdk.org Fri Mar 21 13:05:19 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 13:05:19 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported Message-ID: Hi, Can you help to review this patch? We can let min/max to use cmove if Zicond is supported rather than a branch. At this same time, this patch also simplify the code of min/max. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24153/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24153&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352607 Stats: 66 lines in 1 file changed: 1 ins; 48 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/24153.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24153/head:pull/24153 PR: https://git.openjdk.org/jdk/pull/24153 From mablakatov at openjdk.org Fri Mar 21 13:06:13 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Fri, 21 Mar 2025 13:06:13 GMT Subject: RFR: 8343689: AArch64: Optimize MulReduction implementation [v3] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 14:54:45 GMT, Mikhail Ablakatov wrote: >> Add a reduce_mul intrinsic SVE specialization for >= 256-bit long vectors. It multiplies halves of the source vector using SVE instructions to get to a 128-bit long vector that fits into a SIMD&FP register. After that point, existing ASIMD implementation is used. >> >> Nothing changes for <= 128-bit long vectors as for those the existing ASIMD implementation is used directly still. >> >> The benchmarks below are from [panama-vector/vectorIntrinsics:test/micro/org/openjdk/bench/jdk/incubator/vector/operation](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation). To the best of my knowledge, openjdk/jdk is missing VectorAPI reducion micro-benchmarks. >> >> Benchmarks results: >> >> Neoverse-V1 (SVE 256-bit) >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 5447.643 11455.535 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 3388.183 7144.301 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 3010.974 4911.485 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 1539.137 2562.835 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 1355.551 4158.128 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 1715.854 3284.189 ops/ms >> >> >> Fujitsu A64FX (SVE 512-bit): >> >> Benchmark (size) Mode master PR Units >> ByteMaxVector.MULLanes 1024 thrpt 1091.692 2887.798 ops/ms >> ShortMaxVector.MULLanes 1024 thrpt 597.008 1863.338 ops/ms >> IntMaxVector.MULLanes 1024 thrpt 510.642 1348.651 ops/ms >> LongMaxVector.MULLanes 1024 thrpt 468.878 878.620 ops/ms >> FloatMaxVector.MULLanes 1024 thrpt 376.284 2237.564 ops/ms >> DoubleMaxVector.MULLanes 1024 thrpt 431.343 1646.792 ops/ms > > Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: > > - fixup: don't modify the value in vsrc > > Fix reduce_mul_integral_gt128b() so it doesn't modify vsrc. With this > change, the result of recursive folding is held in vtmp1. To be able to > pass this intermediate result to reduce_mul_integral_le128b(), we would > have to use another temporary FloatRegister, as vtmp1 would essentially > act as vsrc. It's possible to get around this however: > reduce_mul_integral_le128b() is modified so it's possible to pass > matching vsrc and vtmp2 arguments. By doing this, we save ourselves a > temporary register in rules that match to reduce_mul_integral_gt128b(). > - cleanup: revert an unnecessary change to reduce_mul_fp_le128b() formating Apologies to all of the reviewers, but it seems that I won't have time to address highlighted issues until May/June. I'm converting the PR to draft for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23181#issuecomment-2743305206 From thartmann at openjdk.org Fri Mar 21 13:07:09 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 21 Mar 2025 13:07:09 GMT Subject: RFR: 8351515: C2 incorrectly removes double negation for double and float [v2] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 12:09:40 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> A fuzzer run discovered that code of the form `0 - (0 - x)` yields the wrong result for `x = -0.0`. According to the [Java Language Spec 15.8.2](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.18.2) the sum of floating point zeroes of opposite sign is positive zero. Hence, `(0 - (0 - (-0.0))` must result in `+0.0`, but due to the folding of all double negations in `SubNode::Identity` this was optimized to `-0.0` >> >> # Changeset overview >> >> To fix this issue, I excluded floating point numbers from the folding of double negations, which also includes `Float16` values. This might seem excessive at first glance, but we do not track range information for floating point types. Hence, we could still perform the folding of the double negation if a floating point constant is not `-0.0`. However, constant folding already takes care of this. >> >> Changes: >> - IR-Framework: fix `IRNode.SUB`not matching `SubHFNode` >> - Add a regression IR-test >> - Exclude floating point `SubNodes` from folding double negations >> >> # Testing >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/13989207348) >> - `tier1` through `tier5` plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Improve comment Looks good to me otherwise! src/hotspot/share/opto/subnode.cpp line 61: > 59: // is not the same as subtraction for floating point numbers > 60: // (cf. JLS ? 15.15.4). `0-(0-(-0.0))` must equal to positive 0.0 according to > 61: // JLS ? 15.8.2, but would result in -0.0 this would apply. Suggestion: // (cf. JLS ? 15.15.4). `0-(0-(-0.0))` must be equal to positive 0.0 according to // JLS ? 15.8.2, but would result in -0.0 if this folding would be applied. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24150#pullrequestreview-2705876414 PR Review Comment: https://git.openjdk.org/jdk/pull/24150#discussion_r2007540226 From duke at openjdk.org Fri Mar 21 13:10:25 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 21 Mar 2025 13:10:25 GMT Subject: RFR: 8351515: C2 incorrectly removes double negation for double and float [v3] In-Reply-To: References: Message-ID: > # Issue Summary > > A fuzzer run discovered that code of the form `0 - (0 - x)` yields the wrong result for `x = -0.0`. According to the [Java Language Spec 15.8.2](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.18.2) the sum of floating point zeroes of opposite sign is positive zero. Hence, `(0 - (0 - (-0.0))` must result in `+0.0`, but due to the folding of all double negations in `SubNode::Identity` this was optimized to `-0.0` > > # Changeset overview > > To fix this issue, I excluded floating point numbers from the folding of double negations, which also includes `Float16` values. This might seem excessive at first glance, but we do not track range information for floating point types. Hence, we could still perform the folding of the double negation if a floating point constant is not `-0.0`. However, constant folding already takes care of this. > > Changes: > - IR-Framework: fix `IRNode.SUB`not matching `SubHFNode` > - Add a regression IR-test > - Exclude floating point `SubNodes` from folding double negations > > # Testing > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/13989207348) > - `tier1` through `tier5` plus Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix comment wording Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24150/files - new: https://git.openjdk.org/jdk/pull/24150/files/dfcc756f..9c79f93c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24150&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24150&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24150.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24150/head:pull/24150 PR: https://git.openjdk.org/jdk/pull/24150 From mli at openjdk.org Fri Mar 21 13:20:11 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 13:20:11 GMT Subject: RFR: 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 02:32:18 GMT, Fei Yang wrote: > Hi, please consider this small change. > > `UsePoly1305Intrinsics`, `UseMD5Intrinsics` and `UseSHA1Intrinsics` depend on `!AvoidUnalignedAccesses` and thus are unavailable on platforms with slow unaligned accesses. But these options could still be enabled on the command line, which I think could be suprising to our end users as these intrinsics will only have negative impact on performance numbers for such platforms. It seems to me more reasonable to print warnings and keep them disabled when enabled by the user on such platforms. After this change, we have: > > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseMD5Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for MD5 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UsePoly1305Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for Poly1305 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseSHA1Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for SHA-1 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) Looks good, just one minor suggestion. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 322: > 320: } > 321: > 322: if (!AvoidUnalignedAccesses) { In c2_initialize, it might be good to merge multiple flags depends on `!AvoidUnalignedAccesses` at one location. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24123#pullrequestreview-2705907699 PR Review Comment: https://git.openjdk.org/jdk/pull/24123#discussion_r2007558140 From roland at openjdk.org Fri Mar 21 13:22:17 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Mar 2025 13:22:17 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Wed, 19 Mar 2025 14:36:29 GMT, Christian Hagedorn wrote: >> The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). >> >> Note that most of the line changes are from new tests. >> >> ### The Problem >> There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. >> >> #### An Inner Loop without Parse Predicates >> The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) >> >> We only have Parse Predicates for the outer loop. Why? >> >> Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) >> >> In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. >> >> For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. >> >> #### Predicates from a Folded Loop End up at Another Loop >> As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Small things > - Fix test comments > - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. > - Merge branch 'master' into JDK-8350579 > - Revert fix completely > - 8350579: Remove Template Assertion Predicates belonging to a > loop once it is folded away during IGVN Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23823#pullrequestreview-2705922512 From roland at openjdk.org Fri Mar 21 13:27:13 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Mar 2025 13:27:13 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v14] In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 07:13:59 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: >> >> - test with Long.Min/Long.Max + CONST64 >> - Merge branch 'master' into JDK-8349361 >> - review >> - Update src/hotspot/share/opto/mulnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/mulnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/mulnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/mulnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/mulnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/c2/irTests/RShiftLNodeIdealizationTests.java >> >> Co-authored-by: Emanuel Peter >> - Update src/hotspot/share/opto/mulnode.cpp >> >> Co-authored-by: Emanuel Peter >> - ... and 20 more: https://git.openjdk.org/jdk/compare/30ba4871...a56e397b > > Not sure if that still reproduces after your changes. LMK when I should run testing again. @eme64 any update on testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23438#issuecomment-2743358990 From chagedorn at openjdk.org Fri Mar 21 13:33:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 21 Mar 2025 13:33:17 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Wed, 19 Mar 2025 14:36:29 GMT, Christian Hagedorn wrote: >> The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). >> >> Note that most of the line changes are from new tests. >> >> ### The Problem >> There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. >> >> #### An Inner Loop without Parse Predicates >> The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) >> >> We only have Parse Predicates for the outer loop. Why? >> >> Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) >> >> In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. >> >> For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. >> >> #### Predicates from a Folded Loop End up at Another Loop >> As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Small things > - Fix test comments > - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. > - Merge branch 'master' into JDK-8350579 > - Revert fix completely > - 8350579: Remove Template Assertion Predicates belonging to a > loop once it is folded away during IGVN Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23823#issuecomment-2743374851 From epeter at openjdk.org Fri Mar 21 13:33:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 13:33:29 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v14] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 16:37:01 GMT, Roland Westrelin wrote: >> This change refactors `RShiftI`/`RshiftL` `Ideal`, `Identity` and >> `Value` because the `int` and `long` versions are very similar and so >> there's no logic duplication. In the process, support for some extra >> transformations is added to `RShiftL`. I also added some new test >> cases. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: > > - test with Long.Min/Long.Max + CONST64 > - Merge branch 'master' into JDK-8349361 > - review > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/RShiftLNodeIdealizationTests.java > > Co-authored-by: Emanuel Peter > - Update src/hotspot/share/opto/mulnode.cpp > > Co-authored-by: Emanuel Peter > - ... and 20 more: https://git.openjdk.org/jdk/compare/cee04e4f...a56e397b Gave it a quick pass again, I think this is good to go (though better to integrate after the weekend...) @rwestrel thanks for the work and all the updates ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23438#pullrequestreview-2705956672 From epeter at openjdk.org Fri Mar 21 13:33:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 13:33:30 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v14] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 13:24:46 GMT, Roland Westrelin wrote: >> Not sure if that still reproduces after your changes. LMK when I should run testing again. > > @eme64 any update on testing? @rwestrel Testing looks good, thanks for the ping :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23438#issuecomment-2743369387 From fyang at openjdk.org Fri Mar 21 13:36:02 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Mar 2025 13:36:02 GMT Subject: RFR: 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled [v2] In-Reply-To: References: Message-ID: > Hi, please consider this small change. > > `UsePoly1305Intrinsics`, `UseMD5Intrinsics` and `UseSHA1Intrinsics` depend on `!AvoidUnalignedAccesses` and thus are unavailable on platforms with slow unaligned accesses. But these options could still be enabled on the command line, which I think could be suprising to our end users as these intrinsics will only have negative impact on performance numbers for such platforms. It seems to me more reasonable to print warnings and keep them disabled when enabled by the user on such platforms. After this change, we have: > > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseMD5Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for MD5 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UsePoly1305Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for Poly1305 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseSHA1Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for SHA-1 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Review comment - Merge remote-tracking branch 'upstream/master' into JDK-8352477 - 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24123/files - new: https://git.openjdk.org/jdk/pull/24123/files/bb02e1c6..5bca089b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24123&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24123&range=00-01 Stats: 5430 lines in 183 files changed: 1796 ins; 1924 del; 1710 mod Patch: https://git.openjdk.org/jdk/pull/24123.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24123/head:pull/24123 PR: https://git.openjdk.org/jdk/pull/24123 From fyang at openjdk.org Fri Mar 21 13:36:03 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Mar 2025 13:36:03 GMT Subject: RFR: 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled [v2] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 13:13:55 GMT, Hamlin Li wrote: >> Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Review comment >> - Merge remote-tracking branch 'upstream/master' into JDK-8352477 >> - 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 322: > >> 320: } >> 321: >> 322: if (!AvoidUnalignedAccesses) { > > In c2_initialize, it might be good to merge multiple flags depends on `!AvoidUnalignedAccesses` at one location. Make sense. Fixed. Thanks for having a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24123#discussion_r2007586948 From roland at openjdk.org Fri Mar 21 13:43:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Mar 2025 13:43:11 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v4] In-Reply-To: References: Message-ID: > The `arraycopy` writes to a non escaping array so its `ArrayCopy` node > is marked as having a narrow memory effect. One of the loads from the > destination after the copy is transformed into a load from the source > array (the rationale being that if there's no load from the > destination of the copy, the `arraycopy` is not needed). The load from > the source has the input memory state of the `ArrayCopy` as memory > input. That load is then sunk out of the loop and its control is > updated to be after the `ArrayCopy`. That's legal because the > `ArrayCopy` only has a narrow memory effect and can't modify the > source. The `ArrayCopy` can't be eliminated and is expanded. In the > process, a `MemBar` that has a wide memory effect is added. The load > from the source has control after the membar but memory state before > and because the membar has a wide memory effect, the load is anti > dependent on the membar: the graph is broken (the load can't be pinned > after the membar and anti dependent on it). > > In short, the problem is that the graph is transformed under the > assumption that the `ArrayCopy` has a narrow effect but the > `ArrayCopy` is expanded to a subgraph that has a wide memory > effect. The fix I propose is to not insert a membar with a wide memory > effect. We still need a membar when the destination is non escaping > because the expanded `ArrayCopy`, if it writes to a tighly allocated > array, writes to raw memory and not to the destination memory slice. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23465/files - new: https://git.openjdk.org/jdk/pull/23465/files/aa7b4478..878b0b87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23465.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23465/head:pull/23465 PR: https://git.openjdk.org/jdk/pull/23465 From thartmann at openjdk.org Fri Mar 21 13:51:14 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 21 Mar 2025 13:51:14 GMT Subject: RFR: 8351515: C2 incorrectly removes double negation for double and float [v3] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 13:10:25 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> A fuzzer run discovered that code of the form `0 - (0 - x)` yields the wrong result for `x = -0.0`. According to the [Java Language Spec 15.8.2](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.18.2) the sum of floating point zeroes of opposite sign is positive zero. Hence, `(0 - (0 - (-0.0))` must result in `+0.0`, but due to the folding of all double negations in `SubNode::Identity` this was optimized to `-0.0` >> >> # Changeset overview >> >> To fix this issue, I excluded floating point numbers from the folding of double negations, which also includes `Float16` values. This might seem excessive at first glance, but we do not track range information for floating point types. Hence, we could still perform the folding of the double negation if a floating point constant is not `-0.0`. However, constant folding already takes care of this. >> >> Changes: >> - IR-Framework: fix `IRNode.SUB`not matching `SubHFNode` >> - Add a regression IR-test >> - Exclude floating point `SubNodes` from folding double negations >> >> # Testing >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/13989207348) >> - `tier1` through `tier5` plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment wording > > Co-authored-by: Tobias Hartmann Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24150#pullrequestreview-2706010186 From dlunden at openjdk.org Fri Mar 21 14:01:22 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 21 Mar 2025 14:01:22 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v8] In-Reply-To: References: <6l8orDGDTI-ADWxEmDjMPX1uorIhxLd3T55s0eIzJ3I=.0cb9d2c8-4302-408f-b64e-dc9a8e3d4145@github.com> <2uVU0x7eKKHHDLTEAftG-e9qUSMG95Z968T3uyMsKDc=.f24889f5-098b-4b7b-9943-a794564b69f0@github.com> Message-ID: On Fri, 21 Mar 2025 07:16:53 GMT, Daniel Lund?n wrote: >> @dlunde What's the state with this one? Are you looking for reviews? > > @eme64: Yes, looking for reviews! But, let us check with @robcasloz before you start a review. He mentioned he was also going to review this (and has partially reviewed and contributed to the changeset already), and since it is quite a large changeset it would be good to coordinate our efforts. > @dlunde Ok, then I think it makes more sense if @robcasloz reviews this first, and then me second :) I and Roberto agree. Thanks Emanuel! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2743448124 From roland at openjdk.org Fri Mar 21 14:07:45 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Mar 2025 14:07:45 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v5] In-Reply-To: References: Message-ID: > The `arraycopy` writes to a non escaping array so its `ArrayCopy` node > is marked as having a narrow memory effect. One of the loads from the > destination after the copy is transformed into a load from the source > array (the rationale being that if there's no load from the > destination of the copy, the `arraycopy` is not needed). The load from > the source has the input memory state of the `ArrayCopy` as memory > input. That load is then sunk out of the loop and its control is > updated to be after the `ArrayCopy`. That's legal because the > `ArrayCopy` only has a narrow memory effect and can't modify the > source. The `ArrayCopy` can't be eliminated and is expanded. In the > process, a `MemBar` that has a wide memory effect is added. The load > from the source has control after the membar but memory state before > and because the membar has a wide memory effect, the load is anti > dependent on the membar: the graph is broken (the load can't be pinned > after the membar and anti dependent on it). > > In short, the problem is that the graph is transformed under the > assumption that the `ArrayCopy` has a narrow effect but the > `ArrayCopy` is expanded to a subgraph that has a wide memory > effect. The fix I propose is to not insert a membar with a wide memory > effect. We still need a membar when the destination is non escaping > because the expanded `ArrayCopy`, if it writes to a tighly allocated > array, writes to raw memory and not to the destination memory slice. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: -XX:+TraceLoopOpts fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23465/files - new: https://git.openjdk.org/jdk/pull/23465/files/878b0b87..6d48b9f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=03-04 Stats: 8 lines in 2 files changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23465.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23465/head:pull/23465 PR: https://git.openjdk.org/jdk/pull/23465 From kvn at openjdk.org Fri Mar 21 14:09:10 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Mar 2025 14:09:10 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 17:52:46 GMT, Vladimir Kozlov wrote: >> Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. >> With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. >> >> There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). >> >> Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Update field default setting @dean-long and @bulasevich do you approve current changes? Or I should go with proposed RelocIterator fix? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24102#issuecomment-2743476699 From mli at openjdk.org Fri Mar 21 14:12:11 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 14:12:11 GMT Subject: RFR: 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled [v2] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 13:36:02 GMT, Fei Yang wrote: >> Hi, please consider this small change. >> >> `UsePoly1305Intrinsics`, `UseMD5Intrinsics` and `UseSHA1Intrinsics` depend on `!AvoidUnalignedAccesses` and thus are unavailable on platforms with slow unaligned accesses. But these options could still be enabled on the command line, which I think could be suprising to our end users as these intrinsics will only have negative impact on performance numbers for such platforms. It seems to me more reasonable to print warnings and keep them disabled when enabled by the user on such platforms. After this change, we have: >> >> >> ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseMD5Intrinsics -version >> OpenJDK 64-Bit Server VM warning: Intrinsics for MD5 crypto hash functions not available on this CPU. >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) >> >> ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UsePoly1305Intrinsics -version >> OpenJDK 64-Bit Server VM warning: Intrinsics for Poly1305 crypto hash functions not available on this CPU. >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) >> >> ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseSHA1Intrinsics -version >> OpenJDK 64-Bit Server VM warning: Intrinsics for SHA-1 crypto hash functions not available on this CPU. >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Review comment > - Merge remote-tracking branch 'upstream/master' into JDK-8352477 > - 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled Still good, thanks for updating! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24123#pullrequestreview-2706103934 From mbaesken at openjdk.org Fri Mar 21 14:25:50 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 21 Mar 2025 14:25:50 GMT Subject: RFR: 8352486: [ubsan] compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type const struct unnamed struct Message-ID: When running ubsan enabled binaries on macOS aarch, the test serviceability/dcmd/compiler/CompilerMemoryStatisticTest triggers the following warning : /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type 'const struct (unnamed struct at /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:649:3)[64]' #0 0x108d99cf0 in void MemStatStore::iterate_sorted_filtered(MemStatStore::print_table(outputStream*, bool, unsigned long, int) const::'lambda'(MemStatEntry const*), unsigned long, int, MemStatStore::iteration_result&) const compilationMemoryStatistic.cpp:659 #1 0x108d97c68 in MemStatStore::print_table(outputStream*, bool, unsigned long, int) const compilationMemoryStatistic.cpp:734 #2 0x108d9789c in CompilationMemoryStatistic::print_all_by_size(outputStream*, bool, bool, unsigned long, int) compilationMemoryStatistic.cpp:1044 #3 0x108d97b74 in CompilationMemoryStatistic::print_jcmd_report(outputStream*, bool, bool, unsigned long) compilationMemoryStatistic.cpp:1036 #4 0x1090cc72c in DCmd::Executor::execute(DCmd*, JavaThread*) diagnosticFramework.cpp:421 #5 0x108ab6a0c in jcmd(AttachOperation*, attachStream*)::Executor::execute(DCmd*, JavaThread*) attachListener.cpp:391 #6 0x1090cbf64 in DCmd::Executor::parse_and_execute(char const*, char, JavaThread*) diagnosticFramework.cpp:414 #7 0x108ab5f98 in jcmd(AttachOperation*, attachStream*) attachListener.cpp:395 #8 0x108ab2db0 in AttachListenerThread::thread_entry(JavaThread*, JavaThread*) attachListener.cpp:636 #9 0x1093cc254 in JavaThread::thread_main_inner() javaThread.cpp:776 #10 0x109c08d68 in Thread::call_run() thread.cpp:231 #11 0x10992dc44 in thread_native_entry(Thread*) os_bsd.cpp:601 #12 0x1936fef90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90) #13 0x1936f9d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30) ------------- Commit messages: - JDK-8352486 Changes: https://git.openjdk.org/jdk/pull/24156/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24156&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352486 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24156.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24156/head:pull/24156 PR: https://git.openjdk.org/jdk/pull/24156 From kvn at openjdk.org Fri Mar 21 14:35:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Mar 2025 14:35:14 GMT Subject: RFR: 8352486: [ubsan] compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type const struct unnamed struct In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 14:20:28 GMT, Matthias Baesken wrote: > When running ubsan enabled binaries on macOS aarch, the test serviceability/dcmd/compiler/CompilerMemoryStatisticTest triggers the following warning : > > > /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type 'const struct (unnamed struct at /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:649:3)[64]' > #0 0x108d99cf0 in void MemStatStore::iterate_sorted_filtered(MemStatStore::print_table(outputStream*, bool, unsigned long, int) const::'lambda'(MemStatEntry const*), unsigned long, int, MemStatStore::iteration_result&) const compilationMemoryStatistic.cpp:659 > #1 0x108d97c68 in MemStatStore::print_table(outputStream*, bool, unsigned long, int) const compilationMemoryStatistic.cpp:734 > #2 0x108d9789c in CompilationMemoryStatistic::print_all_by_size(outputStream*, bool, bool, unsigned long, int) compilationMemoryStatistic.cpp:1044 > #3 0x108d97b74 in CompilationMemoryStatistic::print_jcmd_report(outputStream*, bool, bool, unsigned long) compilationMemoryStatistic.cpp:1036 > #4 0x1090cc72c in DCmd::Executor::execute(DCmd*, JavaThread*) diagnosticFramework.cpp:421 > #5 0x108ab6a0c in jcmd(AttachOperation*, attachStream*)::Executor::execute(DCmd*, JavaThread*) attachListener.cpp:391 > #6 0x1090cbf64 in DCmd::Executor::parse_and_execute(char const*, char, JavaThread*) diagnosticFramework.cpp:414 > #7 0x108ab5f98 in jcmd(AttachOperation*, attachStream*) attachListener.cpp:395 > #8 0x108ab2db0 in AttachListenerThread::thread_entry(JavaThread*, JavaThread*) attachListener.cpp:636 > #9 0x1093cc254 in JavaThread::thread_main_inner() javaThread.cpp:776 > #10 0x109c08d68 in Thread::call_run() thread.cpp:231 > #11 0x10992dc44 in thread_native_entry(Thread*) os_bsd.cpp:601 > #12 0x1936fef90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90) > #13 0x1936f9d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30) Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24156#pullrequestreview-2706176759 From roland at openjdk.org Fri Mar 21 14:44:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Mar 2025 14:44:08 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 09:01:11 GMT, Christian Hagedorn wrote: > Drive by comments: Is `-XX:-UseOnStackReplacement` required to reproduce the issue? No, it's not. I trimmed the list of options for the tests. > There was also a crash when running with `-XX:+TraceLoopOpts`. Can you also add a run with that flag to verify that this patch also fixes that? Added. The `TraceLoopOpts` crash reproduces: the code hits a malformed counted loop. I tweaked the printing code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2007730637 From roland at openjdk.org Fri Mar 21 14:49:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Mar 2025 14:49:09 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v5] In-Reply-To: References: Message-ID: <4hKl3zRJ6EP4QA-iuKiEpdwIqFk2-YvrpixAGy_VidU=.e9490e22-7751-41a6-a3e7-202930be570a@github.com> On Thu, 20 Mar 2025 07:38:02 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> -XX:+TraceLoopOpts fix > > src/hotspot/share/opto/macroArrayCopy.cpp line 826: > >> 824: } >> 825: >> 826: if (is_partial_array_copy) { > > Why is this check no longer required? ` ArrayCopyNode::may_modify()` performs some pattern matching and needs to be in sync with the shape of the array copy once expanded. If that shape changes then ` ArrayCopyNode::may_modify()` needs to be adjusted. The code you points to was added when the shape of the expanded array copy was changed to avoid a complicated update to the pattern matching in ` ArrayCopyNode::may_modify()`. What I propose is to get rid of the pattern matching because it's fragile and to instead always use the trick from that change where the final `MemBarNode` is marked, so make it unconditional. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2007739875 From rcastanedalo at openjdk.org Fri Mar 21 14:54:21 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 21 Mar 2025 14:54:21 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: <7Fzrb4C-4VyJlOMUaaFqhTzlj4o7dVXS8-EkLCiVVA4=.367fe44f-540c-49cc-aa1c-3ab381febd32@github.com> References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> <7Fzrb4C-4VyJlOMUaaFqhTzlj4o7dVXS8-EkLCiVVA4=.367fe44f-540c-49cc-aa1c-3ab381febd32@github.com> Message-ID: On Fri, 21 Mar 2025 09:42:16 GMT, Emanuel Peter wrote: > What would be a better name though? @merykitty had the suggestions `MemNode::value_type()` or `MemNode::value_basic_type()` (see comment [above](https://github.com/openjdk/jdk/pull/24005#issuecomment-2724445592)), I like both better than the current name. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2743598373 From rcastanedalo at openjdk.org Fri Mar 21 14:59:09 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 21 Mar 2025 14:59:09 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v3] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 15:16:16 GMT, Tobias Hartmann wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Explicitly disable optimization for mismatching stores; add positive and negative tests > > Good catch and nice tests! The fix looks good to me. @TobiHartmann @merykitty @eme64 Thanks for reviewing! I will update the tests as suggested by @eme64 and re-run testing over the weekend. @RealFYang I enabled the new IR tests in `test/hotspot/jtreg/compiler/loopopts/TestArrayFillIntrinsic.java` on riscv64 because this platform seems to handle array fill intrinsification similarly to x64 and aarch64. Would you like to test it before integration? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2743614014 From mli at openjdk.org Fri Mar 21 15:03:34 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 21 Mar 2025 15:03:34 GMT Subject: RFR: 8352615: [Test] RISC-V: TestVectorizationMultiInvar.java fails on riscv64 without rvv support Message-ID: Hi, Can you help to review this trivial patch? TestVectorizationMultiInvar.java fails on riscv if rvv is not support, as it will verify the `MaxVectorSize > 0` in test framework. Thanks! ------------- Commit messages: - copyright - initial commit Changes: https://git.openjdk.org/jdk/pull/24157/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24157&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352615 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24157.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24157/head:pull/24157 PR: https://git.openjdk.org/jdk/pull/24157 From epeter at openjdk.org Fri Mar 21 15:17:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Mar 2025 15:17:13 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> <7Fzrb4C-4VyJlOMUaaFqhTzlj4o7dVXS8-EkLCiVVA4=.367fe44f-540c-49cc-aa1c-3ab381febd32@github.com> Message-ID: On Fri, 21 Mar 2025 14:51:25 GMT, Roberto Casta?eda Lozano wrote: > > What would be a better name though? > > @merykitty had the suggestions `MemNode::value_type()` or `MemNode::value_basic_type()` (see comment [above](https://github.com/openjdk/jdk/pull/24005#issuecomment-2724445592)), I like both better than the current name. @merykitty @robcasloz `MemNode::value_basic_type()` sounds like the most descriptive and accurate. Great! @robcasloz , will you file an RFE for that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2743685337 From kvn at openjdk.org Fri Mar 21 15:32:09 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Mar 2025 15:32:09 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer [v2] In-Reply-To: <4lZVM2kTFAD5ybpReIHEtTEovOvlgiG4qXDMM4Q8PS8=.32431f2d-7f48-4b1d-abf8-df762a7f839d@github.com> References: <4lZVM2kTFAD5ybpReIHEtTEovOvlgiG4qXDMM4Q8PS8=.32431f2d-7f48-4b1d-abf8-df762a7f839d@github.com> Message-ID: On Thu, 20 Mar 2025 20:06:49 GMT, Doug Simon wrote: >> This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - initialize _total_start with nullptr instead of 0 > - moved initialization of _total_start and _total_size Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24118#pullrequestreview-2706376488 From kvn at openjdk.org Fri Mar 21 15:32:09 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Mar 2025 15:32:09 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer [v2] In-Reply-To: <5t0YeG1y5BKm0Zgliihna1AB1NXWwTrJJMFMlC5Res0=.4b80473c-e58d-4ef1-a321-1aa27bdcc19f@github.com> References: <4lZVM2kTFAD5ybpReIHEtTEovOvlgiG4qXDMM4Q8PS8=.32431f2d-7f48-4b1d-abf8-df762a7f839d@github.com> <5t0YeG1y5BKm0Zgliihna1AB1NXWwTrJJMFMlC5Res0=.4b80473c-e58d-4ef1-a321-1aa27bdcc19f@github.com> Message-ID: On Thu, 20 Mar 2025 20:45:33 GMT, Vladimir Kozlov wrote: > And you don't need them in [initialize(address code_start, csize_t code_size)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.hpp#L487) Please ignore this suggestion. For some reasons (not enough coffee :( ) I thought it initializes fields to 0 too. Which is not. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24118#issuecomment-2743725578 From dnsimon at openjdk.org Fri Mar 21 15:37:18 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 21 Mar 2025 15:37:18 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 10:16:06 GMT, Matthias Baesken wrote: >> This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. > > With your patch applied I do not see the failure any more, > compiler/jvmci/errors/TestInvalidCompilationResult > runs successfully. Any further comments @MBaesken ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24118#issuecomment-2743740020 From mdoerr at openjdk.org Fri Mar 21 15:39:14 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 21 Mar 2025 15:39:14 GMT Subject: RFR: 8352486: [ubsan] compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type const struct unnamed struct In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 14:20:28 GMT, Matthias Baesken wrote: > When running ubsan enabled binaries on macOS aarch, the test serviceability/dcmd/compiler/CompilerMemoryStatisticTest triggers the following warning : > > > /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type 'const struct (unnamed struct at /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:649:3)[64]' > #0 0x108d99cf0 in void MemStatStore::iterate_sorted_filtered(MemStatStore::print_table(outputStream*, bool, unsigned long, int) const::'lambda'(MemStatEntry const*), unsigned long, int, MemStatStore::iteration_result&) const compilationMemoryStatistic.cpp:659 > #1 0x108d97c68 in MemStatStore::print_table(outputStream*, bool, unsigned long, int) const compilationMemoryStatistic.cpp:734 > #2 0x108d9789c in CompilationMemoryStatistic::print_all_by_size(outputStream*, bool, bool, unsigned long, int) compilationMemoryStatistic.cpp:1044 > #3 0x108d97b74 in CompilationMemoryStatistic::print_jcmd_report(outputStream*, bool, bool, unsigned long) compilationMemoryStatistic.cpp:1036 > #4 0x1090cc72c in DCmd::Executor::execute(DCmd*, JavaThread*) diagnosticFramework.cpp:421 > #5 0x108ab6a0c in jcmd(AttachOperation*, attachStream*)::Executor::execute(DCmd*, JavaThread*) attachListener.cpp:391 > #6 0x1090cbf64 in DCmd::Executor::parse_and_execute(char const*, char, JavaThread*) diagnosticFramework.cpp:414 > #7 0x108ab5f98 in jcmd(AttachOperation*, attachStream*) attachListener.cpp:395 > #8 0x108ab2db0 in AttachListenerThread::thread_entry(JavaThread*, JavaThread*) attachListener.cpp:636 > #9 0x1093cc254 in JavaThread::thread_main_inner() javaThread.cpp:776 > #10 0x109c08d68 in Thread::call_run() thread.cpp:231 > #11 0x10992dc44 in thread_native_entry(Thread*) os_bsd.cpp:601 > #12 0x1936fef90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90) > #13 0x1936f9d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30) +1 ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24156#pullrequestreview-2706397192 From mbaesken at openjdk.org Fri Mar 21 15:41:20 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 21 Mar 2025 15:41:20 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer [v2] In-Reply-To: <4lZVM2kTFAD5ybpReIHEtTEovOvlgiG4qXDMM4Q8PS8=.32431f2d-7f48-4b1d-abf8-df762a7f839d@github.com> References: <4lZVM2kTFAD5ybpReIHEtTEovOvlgiG4qXDMM4Q8PS8=.32431f2d-7f48-4b1d-abf8-df762a7f839d@github.com> Message-ID: On Thu, 20 Mar 2025 20:06:49 GMT, Doug Simon wrote: >> This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - initialize _total_start with nullptr instead of 0 > - moved initialization of _total_start and _total_size Looks good to me! Small nit , please adjust the COPYRIGHT info in codeBuffer.hpp too. ------------- Marked as reviewed by mbaesken (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24118#pullrequestreview-2706405603 From dnsimon at openjdk.org Fri Mar 21 15:50:38 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 21 Mar 2025 15:50:38 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer [v3] In-Reply-To: References: Message-ID: > This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: adjust copyright date ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24118/files - new: https://git.openjdk.org/jdk/pull/24118/files/15d90178..8385c1d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24118&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24118&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24118.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24118/head:pull/24118 PR: https://git.openjdk.org/jdk/pull/24118 From kvn at openjdk.org Fri Mar 21 15:59:24 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Mar 2025 15:59:24 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer [v3] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 15:50:38 GMT, Doug Simon wrote: >> This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > adjust copyright date Re-approved ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24118#pullrequestreview-2706447375 From dnsimon at openjdk.org Fri Mar 21 15:59:24 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 21 Mar 2025 15:59:24 GMT Subject: RFR: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer [v3] In-Reply-To: References: Message-ID: <52qLuphit6NbfpB6qqmJUinlIDqYUV8L_JWE526j44c=.ce5d8346-0c46-42aa-b880-2f794d78ff74@github.com> On Fri, 21 Mar 2025 15:50:38 GMT, Doug Simon wrote: >> This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > adjust copyright date Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24118#issuecomment-2743794004 From dnsimon at openjdk.org Fri Mar 21 15:59:24 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 21 Mar 2025 15:59:24 GMT Subject: Integrated: 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer In-Reply-To: References: Message-ID: <_mbntN9Z8-y9hTMil0PC60zI-tK2rpT2AKLJjofewHs=.258f1018-d8e3-4bec-9c10-db70c4cf7a2a@github.com> On Wed, 19 Mar 2025 15:43:54 GMT, Doug Simon wrote: > This PR addresses undefined behavior in CodeBuffer by making `verify_section_allocation` return early for a partially initialized CodeBuffer. This pull request has now been integrated. Changeset: b8f38563 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/b8f3856389258bba7e267ac3ae275072daec31cd Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod 8352420: [ubsan] codeBuffer.cpp:984:27: runtime error: applying non-zero offset 18446744073709486080 to null pointer Reviewed-by: kvn, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/24118 From rcastanedalo at openjdk.org Fri Mar 21 16:05:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 21 Mar 2025 16:05:11 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> <7Fzrb4C-4VyJlOMUaaFqhTzlj4o7dVXS8-EkLCiVVA4=.367fe44f-540c-49cc-aa1c-3ab381febd32@github.com> Message-ID: On Fri, 21 Mar 2025 15:15:00 GMT, Emanuel Peter wrote: > > > What would be a better name though? > > > > > > @merykitty had the suggestions `MemNode::value_type()` or `MemNode::value_basic_type()` (see comment [above](https://github.com/openjdk/jdk/pull/24005#issuecomment-2724445592)), I like both better than the current name. > > @merykitty @robcasloz `MemNode::value_basic_type()` sounds like the most descriptive and accurate. Great! @robcasloz , will you file an RFE for that? Done: [JDK-8352620](https://bugs.openjdk.org/browse/JDK-8352620). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2743810313 From roland at openjdk.org Fri Mar 21 16:28:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Mar 2025 16:28:10 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: <2OJ3-DbdBBiHjZKVDGNXb-IFLqXi0jI4Ezz_zs6NsMA=.a904b56a-a3c4-4f06-a708-70223beccb60@github.com> Message-ID: On Tue, 18 Mar 2025 14:48:22 GMT, Christian Hagedorn wrote: > But also a problem, indeed. I just think that going into the future, we should still make a reasonable effort to try and let the control path die sanely without needing this patch. It should only serve as a last resort to avoid breaking the graph. While I think it's the safest solution, my concern is that we will not find inefficiencies anymore with this patch. For example, if someone breaks Assertion Predicates, how can we detect this when the graph will always be sane? It's especially tricky now that I'm still adding Assertion Predicate patches and things might break during development and it goes unnoticed. But maybe I just need to turn this patch off locally. I agree with that. So ideally the code for this patch should only execute for those cases not properly handled some other way. I tried to figure out a way to do that but concluded it was not really possible. One thing could be to have a flag on `Compile` that's only set to true once a dangerous transformation is performed (in the case of this test case, some transformation involving cast nodes that widens the type at some point in the graph). The new logic would only execute when that flag is true. Do you think it's worth trying or would the logic still run too often to catch bugs elsewhere? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2743866070 From roland at openjdk.org Fri Mar 21 16:39:15 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Mar 2025 16:39:15 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 08:40:04 GMT, Emanuel Peter wrote: > * Should we keep the HaltNodes in the graph? The question here is if we assume that: > > * these HaltNodes should never be taken, because the data nodes have proven that the path is impossible? Then we could actually just constant fold the if before the HaltNode. > * these HaltNodes may be taken, because maybe there is a bug and only that led to the constant folding of the data node. Hitting a HaltNode would be a proof of a bug, and so we should keep them. It is better to crash the program than to continue in an inconsistent state down the wrong branch. Deopt would have been desirable, but restoring the state is probably near impossible. I think we want to leave the `Halt` nodes in the final code. Investigating crashes when compile code executes is somewhat trickier than crashes when compiling and that's a drawback of this patch. If the `Halt` nodes are removed then, in case of a bug where a path that's expected unreachable is taken, execution could proceed and fail only much later. That would lead to much harder and mysterious bugs. Also it's not guaranteed that there's a an `If` right before the `Cast` or that that `If` is actually the condition guarding the `Cast`. Thanks for the comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2743895442 From roland at openjdk.org Fri Mar 21 16:40:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Mar 2025 16:40:14 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v2] In-Reply-To: <52OYoC5__FdcN8OLwVgdNlb6Fz_IFo8UyKy3GUp5DiM=.708f1ee8-dbbb-4abf-8de0-d94b3b1e2ef6@github.com> References: <52OYoC5__FdcN8OLwVgdNlb6Fz_IFo8UyKy3GUp5DiM=.708f1ee8-dbbb-4abf-8de0-d94b3b1e2ef6@github.com> Message-ID: On Fri, 14 Feb 2025 18:24:25 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'master' into JDK-8349139 >> - fix & test > > Hmmm, may be you are right. I think adding a comment at `PhiNode` saying that people must not rely on it being pinned at the `Region` for dependencies would be a wise move, I can't think of any reason for that besides value narrowing right now but being pinned is a property of `Phi` regardless and we should tell people not to rely on this behaviour. > > For this bug, I think a more general fix is to try to compare the type of the `Phi` with that of the input it is going to be replaced with. If the former is not wider than the latter then we add a `CastNode`, since the cast is only about value range, not strict dependency, we can use `CarryDependency` instead of `UnconditionalDependency`. Am I right? @merykitty see above my late reply to your comments if you missed it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23617#issuecomment-2743898063 From qamai at openjdk.org Fri Mar 21 17:43:08 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 21 Mar 2025 17:43:08 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v2] In-Reply-To: References: <52OYoC5__FdcN8OLwVgdNlb6Fz_IFo8UyKy3GUp5DiM=.708f1ee8-dbbb-4abf-8de0-d94b3b1e2ef6@github.com> Message-ID: On Fri, 21 Mar 2025 16:37:44 GMT, Roland Westrelin wrote: >> Hmmm, may be you are right. I think adding a comment at `PhiNode` saying that people must not rely on it being pinned at the `Region` for dependencies would be a wise move, I can't think of any reason for that besides value narrowing right now but being pinned is a property of `Phi` regardless and we should tell people not to rely on this behaviour. >> >> For this bug, I think a more general fix is to try to compare the type of the `Phi` with that of the input it is going to be replaced with. If the former is not wider than the latter then we add a `CastNode`, since the cast is only about value range, not strict dependency, we can use `CarryDependency` instead of `UnconditionalDependency`. Am I right? > > @merykitty see above my late reply to your comments if you missed it. @rwestrel I have thought about this issue for a while and come to the conclusion that we do depend on a loop phi being a pinned node when doing optimizations (e.g `init <= iv < limit`). As a result, it seems logical to insert a pinned cast here so that the `Phi` does not freely float away when the loop disappears. I agree with your patch then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23617#issuecomment-2744037407 From dhanalla at openjdk.org Fri Mar 21 18:08:08 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Fri, 21 Mar 2025 18:08:08 GMT Subject: RFR: 8350609: Cleanup unknown unwind opcode (0xB) for windows In-Reply-To: References: <4BUZBMlLC1zPnsieDNsbkji0xcmHLd_VHRN3bEhpJ3A=.5d2b315c-20de-4770-93e1-846cd733cde0@github.com> Message-ID: On Thu, 20 Mar 2025 20:13:08 GMT, Dhamoder Nalla wrote: >>> Did you get to test these functions for correctness, possibly using jtreg or some other tests ? >> >> Yes, we have validated the Jtreg Tier 1 tests, including vector-specific tests under /test/jdk/incubator/vector. > >> @dhanalla As @vivdesh asked above: do you have a regression test for this? >> >> You also have this warning above: >> >> Warning ?? Found leading lowercase letter in issue title for 8350609: cleanup unknown unwind opcode (0xB) for windows > I addressed the warning, and regarding regression tests, I have validated the Jtreg Tier 1 tests, including vector-specific tests under /test/jdk/incubator/vector. > @dhanalla Thanks for fixing the waring and running some tests! > > From my understanding, those tests passed before your patch here, correct? If so, then I'm wondering if there could be a regression test for this "unknown unwind opcode", that fails before your patch and passes with your patch? How feasible is that? Thanks @eme64, We are just cleaning up the unknown unwind codes that are not required. The unwind instructions in the .text section remain untouched. The only difference we see after this change is that the output of 'dumpbin.exe /unwindinfo jsvml.dll' will no longer display any unknown unwind opcodes identified in the DLL corresponding to these methods. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23707#issuecomment-2744094395 From duke at openjdk.org Fri Mar 21 18:45:36 2025 From: duke at openjdk.org (Marc Chevalier) Date: Fri, 21 Mar 2025 18:45:36 GMT Subject: RFR: 8352591: Missing UnlockDiagnosticVMOptions in VerifyGraphEdgesWithDeadCodeCheckFromSafepoints test Message-ID: Using `StressIGVN` in product requires `UnlockDiagnosticVMOptions`. My bad Thanks, Marc ------------- Commit messages: - Add UnlockDiagnosticVMOptions to VerifyGraphEdgesWithDeadCodeCheckFromSafepoints Changes: https://git.openjdk.org/jdk/pull/24151/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24151&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352591 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24151.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24151/head:pull/24151 PR: https://git.openjdk.org/jdk/pull/24151 From kvn at openjdk.org Fri Mar 21 18:52:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Mar 2025 18:52:06 GMT Subject: RFR: 8352591: Missing UnlockDiagnosticVMOptions in VerifyGraphEdgesWithDeadCodeCheckFromSafepoints test In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 10:38:09 GMT, Marc Chevalier wrote: > Using `StressIGVN` in product requires `UnlockDiagnosticVMOptions`. My bad > > Thanks, > Marc Good and trivial ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24151#pullrequestreview-2706955171 From dlong at openjdk.org Fri Mar 21 18:53:09 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Mar 2025 18:53:09 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 17:52:46 GMT, Vladimir Kozlov wrote: >> Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. >> With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. >> >> There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). >> >> Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Update field default setting Marked as reviewed by dlong (Reviewer). I prefer the RelocIterator fix, but I will approve this as-is if you want to do the RelocIterator fix as a separate RFE. ------------- PR Review: https://git.openjdk.org/jdk/pull/24102#pullrequestreview-2706956947 PR Comment: https://git.openjdk.org/jdk/pull/24102#issuecomment-2744197039 From jbhateja at openjdk.org Fri Mar 21 20:25:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 21 Mar 2025 20:25:21 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend Message-ID: This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. Kindly review and share your feedback. Best Regards, Jatin [1] https://www.felixcloutier.com/x86/vminsh [2] https://www.felixcloutier.com/x86/vmaxsh ------------- Commit messages: - 8352585: Add special case handling for Float16.max/min x86 backend Changes: https://git.openjdk.org/jdk/pull/24169/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352585 Stats: 272 lines in 6 files changed: 266 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24169/head:pull/24169 PR: https://git.openjdk.org/jdk/pull/24169 From sparasa at openjdk.org Fri Mar 21 20:32:09 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 21 Mar 2025 20:32:09 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 22:55:18 GMT, Srinivas Vamsi Parasa wrote: >>> LGTM, >>> >>> Please file a JBS on future modification in assembler layer for EEVEX to REX/REX2 encoding and append to this PR before committing. >>> >>> Thanks. >> >> Thanks for the review Jatin! The JBS for EEVEX to REX/REX2 demotion has been created: https://bugs.openjdk.org/browse/JDK-8351994 >> >> Thanks, >> Vamsi > >> > @vamsi-parasa I tried to launch testing, but my script fails because of some merge issue. Would you mind merging from master? >> >> Hi Emanuel (@eme64), please see the updated code after the merge with master. > > Hi Emanuel (@eme64), could you please let me know if you're still seeing script failure? > @vamsi-parasa I launched testing now. Please ping me after the weekend for results :) Thank you, Emanuel! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2744385890 From jbhateja at openjdk.org Fri Mar 21 20:33:47 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 21 Mar 2025 20:33:47 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v2] In-Reply-To: References: Message-ID: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Minor cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24169/files - new: https://git.openjdk.org/jdk/pull/24169/files/824d6fbb..d1fd0d84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24169/head:pull/24169 PR: https://git.openjdk.org/jdk/pull/24169 From bulasevich at openjdk.org Fri Mar 21 20:49:09 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 21 Mar 2025 20:49:09 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 17:52:46 GMT, Vladimir Kozlov wrote: >> Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. >> With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. >> >> There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). >> >> Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Update field default setting Marked as reviewed by bulasevich (Committer). I am OK with this change. I can fix RelocIterator too. ------------- PR Review: https://git.openjdk.org/jdk/pull/24102#pullrequestreview-2707224979 PR Comment: https://git.openjdk.org/jdk/pull/24102#issuecomment-2744414297 From kvn at openjdk.org Fri Mar 21 20:54:11 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Mar 2025 20:54:11 GMT Subject: RFR: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 17:52:46 GMT, Vladimir Kozlov wrote: >> Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. >> With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. >> >> There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). >> >> Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Update field default setting Thank you, Boris. I will push this then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24102#issuecomment-2744422515 From kvn at openjdk.org Fri Mar 21 20:54:12 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Mar 2025 20:54:12 GMT Subject: Integrated: 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 18:35:06 GMT, Vladimir Kozlov wrote: > Before [JDK-8343789](https://bugs.openjdk.org/browse/JDK-8343789) `relocation_begin()` was never null even when there was no relocations - it pointed to the beginning of constant or code section in such case. It was used by relocation code to simplify code and avoid null checks. > With that fix `relocation_begin()` points to address in `CodeBlob::_mutable_data` field which could be `nullptr` if there is no relocation and metadata. > > There easy fix is to avoid `nullptr` in `CodeBlob::_mutable_data`. We could do that similar to what we do for `nmethod::_immutable_data`: [nmethod.cpp#L1514](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1514). > > Tested tier1-4, stress, xcomp. Verified with failed tests listed in bug report. This pull request has now been integrated. Changeset: 22182f71 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/22182f71ed520150b1ee05e5b788ecddfb0a6508 Stats: 9 lines in 1 file changed: 5 ins; 0 del; 4 mod 8352112: [ubsan] hotspot/share/code/relocInfo.cpp:130:37: runtime error: applying non-zero offset 18446744073709551614 to null pointer Reviewed-by: dlong, bulasevich ------------- PR: https://git.openjdk.org/jdk/pull/24102 From chagedorn at openjdk.org Fri Mar 21 21:19:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 21 Mar 2025 21:19:06 GMT Subject: RFR: 8352591: Missing UnlockDiagnosticVMOptions in VerifyGraphEdgesWithDeadCodeCheckFromSafepoints test In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 10:38:09 GMT, Marc Chevalier wrote: > Using `StressIGVN` in product requires `UnlockDiagnosticVMOptions`. My bad > > Thanks, > Marc Don't worry about it - can happen to all of us :-) Looks good and trivial, thanks for fixing it! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24151#pullrequestreview-2707279317 From vlivanov at openjdk.org Fri Mar 21 22:37:14 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 21 Mar 2025 22:37:14 GMT Subject: RFR: 8346989: Deoptimization and re-compilation cycle with C2 compiled code In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> <_3l8ylsbgvsqQE1Ihp0BUAx2o_VzcS6R2jWBSKW9u1E=.0dcb6086-ff6f-4c9a-b990-6665a476a3dc@github.com> Message-ID: On Thu, 20 Mar 2025 12:26:52 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/library_call.cpp line 1963: >> >>> 1961: set_i_o(i_o()); >>> 1962: >>> 1963: uncommon_trap(Deoptimization::Reason_intrinsic, >> >> What about using `builtin_throw` here? (Requires some tuning on `builtin_throw` side.) How much does it affect performance? Also, passing `must_throw = true` into `uncommon_trap` may help a bit here as well. > > I think adapting and re-using `builtin_throw` like you described is reasonable but I let @iwanowww confirm :slightly_smiling_face: Yes, that's basically what I had in mind. Currently, the focus of the intrinsic is on well-behaved case (overflows are **very** rare). `builtin_throw()` covers more ground and optimize for scenarios when exceptions are thrown. But it depends on `ciMethod::can_omit_stack_trace()` where `-XX:-OmitStackTraceInFastThrow` mode will suffer from the original problem (continuous deoptimizations), plus a round of recompilations before giving up. I suggest to improve and reuse `builtin_throw` here and add additional checks in the intrinsic to guard against problematic scenario with continuous deoptimizations. IMO it improves performance model for a wide range of use cases while addressing pathological scenarios. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2008427776 From vlivanov at openjdk.org Fri Mar 21 22:50:20 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 21 Mar 2025 22:50:20 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v5] In-Reply-To: <4gi1QLJRikwQR2ShA9zy_cOK4NDsjrJK4ZyyuzuNLjc=.924da387-362c-4a39-b4da-0d347c72d354@github.com> References: <4gi1QLJRikwQR2ShA9zy_cOK4NDsjrJK4ZyyuzuNLjc=.924da387-362c-4a39-b4da-0d347c72d354@github.com> Message-ID: <3TUaCIHwDAl8dK1hATRK8m5XZIK1oeY8231x1HaLl3s=.ac07bdce-7807-461f-8d5a-906d50d1c411@github.com> On Thu, 20 Mar 2025 13:05:19 GMT, Tobias Hartmann wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - JDK-8302459: refactor helper method >> - JDK-8302459: reshape infinite loop check > > src/hotspot/share/opto/compile.cpp line 2050: > >> 2048: assert(is_scheduled_for_igvn_before == is_scheduled_for_igvn_after, "call node removed from IGVN list during inlining pass"); >> 2049: cg->call_node()->set_generator(cg); >> 2050: } > > I find this a bit hard to read. Wouldn't it be semantically equivalent to this? > > > if (is_scheduled_for_igvn_before == is_scheduled_for_igvn_after) { > cg->call_node()->set_generator(cg); > } else { > assert(false, "Some useful message"); > } > > > We wouldn't have separate asserts for the two cases, but I think that's fine since one can easily figure it out from the boolean values. The difference is whether a call can be scheduled for a repeated inlining attempt in the future. `cg->call_node()->set_generator(cg)` reinitializes `cg` in `CallNode` and lets IGVN to submit it for incremental inlining during future passes. The first check guards against a situation when the call node is already on IGVN list (so, it will be automatically rescheduled for inlining during the next IGVN pass causing an infinite loop in incremental inlining). The second assert catches a suspicious situation when the call node disappears from IGVN worklist during failed inlining attempt. IMO it should not happens, hence the assert. But it is benign to allow repeated inlining in such case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r2008439103 From duke at openjdk.org Sat Mar 22 00:50:33 2025 From: duke at openjdk.org (Mohamed Issa) Date: Sat, 22 Mar 2025 00:50:33 GMT Subject: RFR: 8348638: Performance regression in Math.tanh Message-ID: The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 20**. This new endpoint is the closest value above the minimum (**55 * ln(2) / 2**) required for correctness that's possible when only retrieving the topmost word of the input register. The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v24-b33](https://github.com/openjdk/jdk/releases/tag/jdk-24%2B33) as the baseline version. For performance data collected with the regression micro-benchmark referenced in the bug report, see the table below. Each result is the mean of 3 individual runs. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly improve execution times to the point where are almost at parity with the baseline. Also, there is almost no impact to the low value (1, 2) scenarios. | Input range (+/-) | Baseline (ms) | No fix (ms) | With fix (ms) | No fix vs baseline (%) | Fix vs baseline (%) | | :------------------: | :-------------: | :-----------: | :-------------: | :----------------------: | :-------------------: | | 1 | 1842 | 1961 | 1969 | +6.46 | +6.89 | | 2 | 2102 | 2010 | 1998 | -4.38 | -4.95 | | 100 | 801 | 1018 | 716 | +27.09 | -10.61 | | 1000 | 498 | 803 | 519 | +61.24 | +4.22 | | 10000 | 474 | 755 | 491 | +59.28 | +3.59 | | 100000 | 473 | 758 | 491 | +60.25 | +3.81 | For performance data collected with the built in **tanh** micro-benchmark, see the table below. Each result is the mean of 8 individual runs. Overall, there is no significant impact introduced by the changes. So, the uplift provided by the original implementation of the intrinsic remains. | Benchmark | Throughput without fix (op/s) | Throughput with fix (op/s) | Fix vs No Fix (%) | | :-------------------------: | :-------------------------------: | :----------------------------: | :-----------------: | | MathBench.tanhDouble | 103581 | 102610 | -0.94 | Finally, the `jtreg:test/jdk/java/lang/Math/HyperbolicTests.java` test passed with the changes. ------------- Commit messages: - Lightly restructure x86_64 tanh instrinsic implementation to resolve performance regressions found for special inputs Changes: https://git.openjdk.org/jdk/pull/23889/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8348638 Stats: 23 lines in 1 file changed: 6 ins; 7 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23889/head:pull/23889 PR: https://git.openjdk.org/jdk/pull/23889 From jbhateja at openjdk.org Sat Mar 22 00:50:33 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 22 Mar 2025 00:50:33 GMT Subject: RFR: 8348638: Performance regression in Math.tanh In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:44:32 GMT, Mohamed Issa wrote: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. > > 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. > 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 20**. This new endpoint is the closest value above the minimum (**55 * ln(2) / 2**) required for correctness that's possible when only retrieving the topmost word of the input register. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v24-b33](https://github.com/openjdk/jdk/releases/tag/jdk-24%2B33) as the baseline version. > > For performance data collected with the regression micro-benchmark referenced in the bug report, see the table below. Each result is the mean of 3 individual runs. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly improve execution times to the point where are almost at parity with the baseline. Also, there is almost no impact to the low value (1, 2) scenarios. > > | Input range (+/-) | Baseline (ms) | No fix (ms) | With fix (ms) | No fix vs baseline (%) | Fix vs baseline (%) | > | :------------------: | :-------------: | :-----------: | :-------------: | :----------------------: | :-------------------: | > | 1 | 1842 | 1961 | 1969 | +6.46 | +6.89 | > | 2 | 2102 | 2010 | 1998 | -4.38 | -4.95 | > | 100 | 801 | 1018 | 716 | +27.09 | -10.61 | > | 1000 | 498 | 803 | 519 | +61.24 | +4.22 | > | 10000 | 474 | 755 | 491 | +59.28 | +3.59 | > | 100000 | 473 | 758 | 491 | +60.25 | +3.81 ... src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 330: > 328: __ pextrw(rax, xmm0, 3); > 329: __ shll(rax, 16); > 330: __ pextrw(rax, xmm0, 4); There is an output dependency here, result of shll will be overwritten by following pextrw. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r1985474858 From duke at openjdk.org Sat Mar 22 00:50:33 2025 From: duke at openjdk.org (Mohamed Issa) Date: Sat, 22 Mar 2025 00:50:33 GMT Subject: RFR: 8348638: Performance regression in Math.tanh In-Reply-To: References: Message-ID: <_ivXBLS_gJfbSFF4d47M9GlWMiivg5ikVxNnayk4PgY=.d5566cb4-4bc7-4a52-9207-139b086656be@github.com> On Fri, 7 Mar 2025 18:01:10 GMT, Jatin Bhateja wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 20**. This new endpoint is the closest value above the minimum (**55 * ln(2) / 2**) required for correctness that's possible when only retrieving the topmost word of the input register. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v24-b33](https://github.com/openjdk/jdk/releases/tag/jdk-24%2B33) as the baseline version. >> >> For performance data collected with the regression micro-benchmark referenced in the bug report, see the table below. Each result is the mean of 3 individual runs. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly improve execution times to the point where are almost at parity with the baseline. Also, there is almost no impact to the low value (1, 2) scenarios. >> >> | Input range (+/-) | Baseline (ms) | No fix (ms) | With fix (ms) | No fix vs baseline (%) | Fix vs baseline (%) | >> | :------------------: | :-------------: | :-----------: | :-------------: | :----------------------: | :-------------------: | >> | 1 | 1842 | 1961 | 1969 | +6.46 | +6.89 | >> | 2 | 2102 | 2010 | 1998 | -4.38 | -4.95 | >> | 100 | 801 | 1018 | 716 | +27.09 | -10.61 | >> | 1000 | 498 | 803 | 519 | +61.24 | +4.22 | >> | 10000 | 474 | 755 | 491 | +59.28 | +3.59 | >> | 100000 | 473 | 758 | 491 | +60.25 ... > > src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 330: > >> 328: __ pextrw(rax, xmm0, 3); >> 329: __ shll(rax, 16); >> 330: __ pextrw(rax, xmm0, 4); > > There is an output dependency here, result of shll will be overwritten by following pextrw. Thanks - will address in the next update. Using multiple _pextrw_ commands is unnecessary anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r1987395956 From fyang at openjdk.org Sat Mar 22 02:35:16 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 22 Mar 2025 02:35:16 GMT Subject: RFR: 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled [v2] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 13:36:02 GMT, Fei Yang wrote: >> Hi, please consider this small change. >> >> `UsePoly1305Intrinsics`, `UseMD5Intrinsics` and `UseSHA1Intrinsics` depend on `!AvoidUnalignedAccesses` and thus are unavailable on platforms with slow unaligned accesses. But these options could still be enabled on the command line, which I think could be suprising to our end users as these intrinsics will only have negative impact on performance numbers for such platforms. It seems to me more reasonable to print warnings and keep them disabled when enabled by the user on such platforms. After this change, we have: >> >> >> ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseMD5Intrinsics -version >> OpenJDK 64-Bit Server VM warning: Intrinsics for MD5 crypto hash functions not available on this CPU. >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) >> >> ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UsePoly1305Intrinsics -version >> OpenJDK 64-Bit Server VM warning: Intrinsics for Poly1305 crypto hash functions not available on this CPU. >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) >> >> ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseSHA1Intrinsics -version >> OpenJDK 64-Bit Server VM warning: Intrinsics for SHA-1 crypto hash functions not available on this CPU. >> openjdk version "25-internal" 2025-09-16 >> OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) >> OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Review comment > - Merge remote-tracking branch 'upstream/master' into JDK-8352477 > - 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24123#issuecomment-2744906019 From fyang at openjdk.org Sat Mar 22 02:35:16 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 22 Mar 2025 02:35:16 GMT Subject: Integrated: 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 02:32:18 GMT, Fei Yang wrote: > Hi, please consider this small change. > > `UsePoly1305Intrinsics`, `UseMD5Intrinsics` and `UseSHA1Intrinsics` depend on `!AvoidUnalignedAccesses` and thus are unavailable on platforms with slow unaligned accesses. But these options could still be enabled on the command line, which I think could be suprising to our end users as these intrinsics will only have negative impact on performance numbers for such platforms. It seems to me more reasonable to print warnings and keep them disabled when enabled by the user on such platforms. After this change, we have: > > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseMD5Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for MD5 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UsePoly1305Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for Poly1305 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) > > ubuntu at premier-p550:~/jdk$ java -XX:+UnlockDiagnosticVMOptions -XX:+UseSHA1Intrinsics -version > OpenJDK 64-Bit Server VM warning: Intrinsics for SHA-1 crypto hash functions not available on this CPU. > openjdk version "25-internal" 2025-09-16 > OpenJDK Runtime Environment (build 25-internal-adhoc.ubuntu.jdk) > OpenJDK 64-Bit Server VM (build 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing) This pull request has now been integrated. Changeset: 5dd0acb3 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/5dd0acb3cddb96845062c0b7cee1e384e69f43cb Stats: 23 lines in 1 file changed: 13 ins; 4 del; 6 mod 8352477: RISC-V: Print warnings when unsupported intrinsics are enabled Reviewed-by: mli, rehn, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/24123 From syan at openjdk.org Sat Mar 22 02:39:11 2025 From: syan at openjdk.org (SendaoYan) Date: Sat, 22 Mar 2025 02:39:11 GMT Subject: RFR: 8352591: Missing UnlockDiagnosticVMOptions in VerifyGraphEdgesWithDeadCodeCheckFromSafepoints test In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 10:38:09 GMT, Marc Chevalier wrote: > Using `StressIGVN` in product requires `UnlockDiagnosticVMOptions`. My bad > > Thanks, > Marc Marked as reviewed by syan (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24151#pullrequestreview-2707753390 From fyang at openjdk.org Sat Mar 22 02:40:08 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 22 Mar 2025 02:40:08 GMT Subject: RFR: 8352615: [Test] RISC-V: TestVectorizationMultiInvar.java fails on riscv64 without rvv support In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 14:59:21 GMT, Hamlin Li wrote: > Hi, > Can you help to review this trivial patch? > TestVectorizationMultiInvar.java fails on riscv if rvv is not support, as it will verify the `MaxVectorSize > 0` in test framework. > > Thanks! Looks good. Seems this won't menifest on riscv64 platforms where `AlignVector` is true. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24157#pullrequestreview-2707753596 From fyang at openjdk.org Sat Mar 22 03:13:14 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 22 Mar 2025 03:13:14 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v3] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 12:34:39 GMT, Roberto Casta?eda Lozano wrote: >> The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: >> >> >> for (int i = 0; i < array.length; i++) { >> array[i] = 0; >> } >> >> with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: >> >> >> arrayof_jint_fill(array, 0, array.length) >> >> >> The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). >> >> This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations. >> >> Assuming mismatched stores are discarded (as proposed here), an alternative solution would be to define a StoreS node returning the appropriate `memory_type()`. This could be desirable even as a complement to this fix, to prevent similar bugs in the future. I propose to investigate the introduction of a StoreS node in a separate RFE, because it is a much larger and more intrusive changeset, and go with this minimal, local, and non-intrusive fix for backportability. >> >> **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly disable optimization for mismatching stores; add positive and negative tests @RealFYang I enabled the new IR tests in `test/hotspot/jtreg/compiler/loopopts/TestArrayFillIntrinsic.java` on riscv64 because this platform seems to handle array fill intrinsification similarly to x64 and aarch64. Would you like to test it before integration? Hi, Thanks for the ping. Yes, both of the newly-added tests are good on linux-riscv64 platform using fastdebug build. Great! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2744933487 From fyang at openjdk.org Sat Mar 22 07:37:44 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 22 Mar 2025 07:37:44 GMT Subject: RFR: 8352641: [TESTBUG] VerifyGraphEdgesWithDeadCodeCheckFromSafepoints.java fails due to missing UnlockDiagnosticVMOptions Message-ID: <1u4uoL4EBN__oeaNoITcuenbk30zPqmzG8UnQyduTHQ=.b77efeb1-af22-44b4-a144-00bec795a292@github.com> Hi, please review this trivial change fixing a test bug. The test reports following error message when running with release build. `Error: VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions.` This adds the needed UnlockDiagnosticVMOptions option for this test. Same test passes with this extra option. ------------- Commit messages: - 8352641: [TESTBUG] VerifyGraphEdgesWithDeadCodeCheckFromSafepoints.java fails due to missing UnlockDiagnosticVMOptions Changes: https://git.openjdk.org/jdk/pull/24173/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24173&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352641 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24173.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24173/head:pull/24173 PR: https://git.openjdk.org/jdk/pull/24173 From syan at openjdk.org Sat Mar 22 07:49:06 2025 From: syan at openjdk.org (SendaoYan) Date: Sat, 22 Mar 2025 07:49:06 GMT Subject: RFR: 8352641: [TESTBUG] VerifyGraphEdgesWithDeadCodeCheckFromSafepoints.java fails due to missing UnlockDiagnosticVMOptions In-Reply-To: <1u4uoL4EBN__oeaNoITcuenbk30zPqmzG8UnQyduTHQ=.b77efeb1-af22-44b4-a144-00bec795a292@github.com> References: <1u4uoL4EBN__oeaNoITcuenbk30zPqmzG8UnQyduTHQ=.b77efeb1-af22-44b4-a144-00bec795a292@github.com> Message-ID: <6fwyHYv2A5IglkkONoYfdsIl2iEH9e-PafJydq69B5o=.3da4f9fa-7f8e-41b8-b7cb-2e63cc3b36dc@github.com> On Sat, 22 Mar 2025 07:32:33 GMT, Fei Yang wrote: > Hi, please review this trivial change fixing a test bug. > > The test reports following error message when running with release build. > `Error: VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions.` > This adds the needed UnlockDiagnosticVMOptions option for this test. Same test passes with this extra option. Does this duplicated to https://bugs.openjdk.org/browse/JDK-8352591 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24173#issuecomment-2745125880 From fyang at openjdk.org Sat Mar 22 07:56:09 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 22 Mar 2025 07:56:09 GMT Subject: RFR: 8352641: [TESTBUG] VerifyGraphEdgesWithDeadCodeCheckFromSafepoints.java fails due to missing UnlockDiagnosticVMOptions In-Reply-To: <6fwyHYv2A5IglkkONoYfdsIl2iEH9e-PafJydq69B5o=.3da4f9fa-7f8e-41b8-b7cb-2e63cc3b36dc@github.com> References: <1u4uoL4EBN__oeaNoITcuenbk30zPqmzG8UnQyduTHQ=.b77efeb1-af22-44b4-a144-00bec795a292@github.com> <6fwyHYv2A5IglkkONoYfdsIl2iEH9e-PafJydq69B5o=.3da4f9fa-7f8e-41b8-b7cb-2e63cc3b36dc@github.com> Message-ID: On Sat, 22 Mar 2025 07:46:54 GMT, SendaoYan wrote: > Does this duplicated to https://bugs.openjdk.org/browse/JDK-8352591 Ah, yes. I missed that. Will close. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24173#issuecomment-2745128037 From fyang at openjdk.org Sat Mar 22 07:56:10 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 22 Mar 2025 07:56:10 GMT Subject: Withdrawn: 8352641: [TESTBUG] VerifyGraphEdgesWithDeadCodeCheckFromSafepoints.java fails due to missing UnlockDiagnosticVMOptions In-Reply-To: <1u4uoL4EBN__oeaNoITcuenbk30zPqmzG8UnQyduTHQ=.b77efeb1-af22-44b4-a144-00bec795a292@github.com> References: <1u4uoL4EBN__oeaNoITcuenbk30zPqmzG8UnQyduTHQ=.b77efeb1-af22-44b4-a144-00bec795a292@github.com> Message-ID: On Sat, 22 Mar 2025 07:32:33 GMT, Fei Yang wrote: > Hi, please review this trivial change fixing a test bug. > > The test reports following error message when running with release build. > `Error: VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions.` > This adds the needed UnlockDiagnosticVMOptions option for this test. Same test passes with this extra option. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24173 From duke at openjdk.org Sat Mar 22 12:23:47 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 22 Mar 2025 12:23:47 GMT Subject: RFR: 8347706: jvmciEnv.cpp has jvmci includes out of order Message-ID: 8347706: jvmciEnv.cpp has jvmci includes out of order ------------- Commit messages: - 8347706: Reorder jvmci includes in jvmciEvn.cpp Changes: https://git.openjdk.org/jdk/pull/24174/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24174&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347706 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24174.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24174/head:pull/24174 PR: https://git.openjdk.org/jdk/pull/24174 From dnsimon at openjdk.org Sat Mar 22 14:44:09 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 22 Mar 2025 14:44:09 GMT Subject: RFR: 8347706: jvmciEnv.cpp has jvmci includes out of order In-Reply-To: References: Message-ID: <4eXcUGVycNCCf3Ago-Mtf7zobSoLrZVEateUS0NpQuQ=.3512d121-7ad0-422f-9dcd-3edf4e28ec4e@github.com> On Sat, 22 Mar 2025 12:16:31 GMT, Zihao Lin wrote: > Reorder jvmci includes in jvmciEvn.cpp The change is fine but I personally think manually fixing these ordering problems is not the best use of time until there's a way to automatically enforce the expected ordering (and catch regressions). ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24174#pullrequestreview-2708058400 From duke at openjdk.org Sat Mar 22 14:54:06 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 22 Mar 2025 14:54:06 GMT Subject: RFR: 8347706: jvmciEnv.cpp has jvmci includes out of order In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 12:16:31 GMT, Zihao Lin wrote: > Reorder jvmci includes in jvmciEvn.cpp You are right, Do we have some code check tool which help to point out the ordering issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24174#issuecomment-2745306642 From duke at openjdk.org Sat Mar 22 14:54:06 2025 From: duke at openjdk.org (duke) Date: Sat, 22 Mar 2025 14:54:06 GMT Subject: RFR: 8347706: jvmciEnv.cpp has jvmci includes out of order In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 12:16:31 GMT, Zihao Lin wrote: > Reorder jvmci includes in jvmciEvn.cpp @linzihao1999 Your change (at version f0a6b84815d7a866a1561426d6b31a5e3f3b3c73) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24174#issuecomment-2745306821 From jbhateja at openjdk.org Sat Mar 22 17:55:27 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 22 Mar 2025 17:55:27 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v6] In-Reply-To: References: Message-ID: > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Removing Generator dependency on incubation module ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22755/files - new: https://git.openjdk.org/jdk/pull/22755/files/1963d4b1..d5da1405 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=04-05 Stats: 54 lines in 4 files changed: 28 ins; 2 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From duke at openjdk.org Sun Mar 23 00:39:11 2025 From: duke at openjdk.org (Zihao Lin) Date: Sun, 23 Mar 2025 00:39:11 GMT Subject: Integrated: 8347706: jvmciEnv.cpp has jvmci includes out of order In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 12:16:31 GMT, Zihao Lin wrote: > Reorder jvmci includes in jvmciEvn.cpp This pull request has now been integrated. Changeset: df9210e6 Author: Zihao Lin Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/df9210e6578acd53384ee1ac06601510c9a52696 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod 8347706: jvmciEnv.cpp has jvmci includes out of order Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/24174 From syan at openjdk.org Sun Mar 23 01:16:20 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 23 Mar 2025 01:16:20 GMT Subject: RFR: 8347706: jvmciEnv.cpp has jvmci includes out of order In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 12:16:31 GMT, Zihao Lin wrote: > Reorder jvmci includes in jvmciEvn.cpp > /sponsor Sorry, did not noticed that this PR no satisfied more than 24 hours... ------------- PR Comment: https://git.openjdk.org/jdk/pull/24174#issuecomment-2745952316 From duke at openjdk.org Sun Mar 23 03:44:13 2025 From: duke at openjdk.org (kuaiwei) Date: Sun, 23 Mar 2025 03:44:13 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v7] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Add more tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/ed5590a9..2550996e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=05-06 Stats: 653 lines in 4 files changed: 627 ins; 10 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From duke at openjdk.org Sun Mar 23 08:28:55 2025 From: duke at openjdk.org (Zihao Lin) Date: Sun, 23 Mar 2025 08:28:55 GMT Subject: RFR: 8211759: C2: Graph after optimizations should not have dead nodes Message-ID: Move the check_no_dead_use() call after the final_graph_reshaping() call to catch dead nodes issue. ------------- Commit messages: - 8211759: C2: Graph after optimizations should not have dead nodes Changes: https://git.openjdk.org/jdk/pull/24175/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24175&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8211759 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24175.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24175/head:pull/24175 PR: https://git.openjdk.org/jdk/pull/24175 From duke at openjdk.org Sun Mar 23 11:21:41 2025 From: duke at openjdk.org (kuaiwei) Date: Sun, 23 Mar 2025 11:21:41 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v8] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/2550996e..d57d0278 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From dnsimon at openjdk.org Sun Mar 23 11:56:11 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 23 Mar 2025 11:56:11 GMT Subject: RFR: 8347706: jvmciEnv.cpp has jvmci includes out of order In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 14:51:16 GMT, Zihao Lin wrote: > You are right, Do we have some code check tool which help to point out the ordering issue? Not as far as I know but it should not be too hard to come up with. I've opened https://bugs.openjdk.org/browse/JDK-8352645 to have this considered. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24174#issuecomment-2746168751 From xgong at openjdk.org Mon Mar 24 02:09:15 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 24 Mar 2025 02:09:15 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value [v2] In-Reply-To: References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: On Fri, 21 Mar 2025 10:07:48 GMT, Emanuel Peter wrote: > Hmm so your patch adds in an extra node. It probably does not cost much, but I'd like to be sure that it's needed. Is there any case where we now have wrong results on master? Because I could not find one, only the assert on `aarch64`. But given that you are adding the `AndI` node, it seems there should be wrong results, right? Can you find a test for that? Actually as I mentioned the variable vector case in the commit message, there would be not wrong results for the rotate operation, if `AndI` is not added besides the assertion. The shift count value that may overflow here is only `0` (other values are safe). For vector variables as shift counts, the masking can be safely omitted because: Vector shift instructions that take a vector register as the shift count may not automatically apply modulo arithmetic based on register size. When the shift count is 32 for int type, the result may be either zeros or src. However, this doesn't affect correctness for rotate since the final result is combined with src using a logical OR operation. Another reason besides the AArch64 assertion that `AndI` is necessary here is: it finally generate the shift IR which has a duplicate `0` as the shift count. And the whole vector shift IR can be optimized out. See the transformation below: (LShiftVI src 32) -> (LShiftVI src 0) -> src > Actually, I have a question. Below, there is this section: > > if (!is_binary_vector_op) { > shiftLCnt = phase->transform(new LShiftCntVNode(shiftLCnt, vt)); > shiftRCnt = phase->transform(new RShiftCntVNode(shiftRCnt, vt)); > } >Can you tell me what this is for? Maybe it is something else. `LShiftCntVNode` and `RShiftCntVNode` are used to generate a vector shift count IR from the scalar `shiftLCnt`/`shiftRCnt` IR. On AArch64, they are the same with `ReplicateNode`. This is necessary, because the shift vector IR requires two vector inputs.] > I see that I'm doing the same AndI trick in SuperWord, so maybe it is needed in general: Yes, this meets the java spec definition for scalar shift operation. > Generally, I think we need better annotations in vectornode.hpp. So for example it would be nice if you could document the assumptions about the input above ShiftVNode. Do we expect the shift value to be in a specific range? Or is it wrapped like the scalar shift operator << and >>? I think it depends on the instruction definition. On AArch64, the scalar shift instruction [1] will modulo the element size it self, which meets the java scalar shift operator `<<` and `>>`, while the vector shift instructions [2] will not do it. I think this is also the reason why an explicit `and` is needed in superword. As a summary, for vector shift IR, we do not have the expectation the shift value is in a specific range. [1] https://developer.arm.com/documentation/ddi0602/2024-12/Base-Instructions/LSL--register---Logical-shift-left--register---an-alias-of-LSLV-?lang=en [2] https://developer.arm.com/documentation/ddi0602/2024-12/SVE-Instructions/LSL--vectors---Logical-shift-left-by-vector--predicated--?lang=en Another solution of this issue: maybe we could specially handle the case for shift cnt of constant `0`. Do you think that would be better? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24051#issuecomment-2746686827 From xgong at openjdk.org Mon Mar 24 02:10:11 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 24 Mar 2025 02:10:11 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 07:28:04 GMT, Xiaohong Gong wrote: >> Hi @eme64 , the IR test is updated according to your suggestion. Could you please look at it again? Thanks so much! > >> @XiaohongGong Could you please also merge here before I rerun the testing? > > Sure and have rebased. Thanks a lot for your testing! > @XiaohongGong Tests launched! Please ping me after the weekend for the results ? Hi @eme64 , thanks for your testing! May I ask about the test results please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2746691248 From xgong at openjdk.org Mon Mar 24 02:11:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 24 Mar 2025 02:11:07 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: References: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> Message-ID: <8gRkivkxdlGCezJE_ZtvkO7ONzLpIpzY0PXT-6MBNI8=.9719afc9-94e9-45ed-a2d4-63e6a3593402@github.com> On Thu, 20 Mar 2025 07:28:58 GMT, Xiaohong Gong wrote: >> @XiaohongGong Can you please merge with master before I launch testing? > > Hi @eme64 I'v rebased this PR. Thanks a lot for your testing! > @XiaohongGong Testing launched! Please ping me after the weekend for the results ;) Thanks for your testing @eme64 . May I ask about the test results please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2746692360 From duke at openjdk.org Mon Mar 24 04:29:03 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 24 Mar 2025 04:29:03 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v9] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into dev/merge_loads - Fix test - Add more tests - Enable StressIGVN and riscv platform - Change tests as review comments - Fix test failure and change for review comments - Revert extract value and add more tests - Add tests - Fix test failure - Remove some debug trace - ... and 1 more: https://git.openjdk.org/jdk/compare/9f743103...e37c4bf3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/d57d0278..e37c4bf3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=07-08 Stats: 39340 lines in 731 files changed: 17868 ins; 16581 del; 4891 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From fyang at openjdk.org Mon Mar 24 04:59:06 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 24 Mar 2025 04:59:06 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 12:53:01 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > We can let min/max to use cmove if Zicond is supported rather than a branch. > At this same time, this patch also simplify the code of min/max. > > Thanks! Hi, Is there any JMH data to look at? src/hotspot/cpu/riscv/riscv.ad line 9073: > 9071: ins_encode %{ > 9072: __ cmov_gt(as_Register($dst$$reg), as_Register($src$$reg), > 9073: as_Register($dst$$reg), as_Register($src$$reg)); But the `ins_cost` isn't updated to reflect this change? It is still `BRANCH_COST + ALU_COST` which will only reflect the branch code. Seems better to create seperate match rules for these cmove cases with `UseZicond` as the predicate and proper costs. We already have two sets of match rules for Max and Min. The other one which should be the most efficient is in file `riscv_b.ad` [1]. Do we have a hardware which implements `Zicond` but doesn't have `Zbb`? [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_b.ad#L436 ------------- PR Review: https://git.openjdk.org/jdk/pull/24153#pullrequestreview-2709106655 PR Review Comment: https://git.openjdk.org/jdk/pull/24153#discussion_r2009457086 From duke at openjdk.org Mon Mar 24 06:52:08 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 24 Mar 2025 06:52:08 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> Message-ID: On Fri, 21 Mar 2025 08:59:46 GMT, Emanuel Peter wrote: >> @eme64 @robcasloz I think the patch for merge loads optimization is ready for PR, could you take time to review it? Thanks. > > @kuaiwei Just ping me when you would like me to re-review :) Hi, @eme64 , I changed as comments and merge with master branch. It look no issue in testing. Could you help check it again? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2747058464 From duke at openjdk.org Mon Mar 24 06:52:09 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 24 Mar 2025 06:52:09 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <8pcR6tQ3Zv8FRCLRxaG57NuZlVDB4LD9mCSKgHmlKEs=.a04c40ed-4d80-4eea-a573-abb3446d1ab9@github.com> Message-ID: On Tue, 18 Mar 2025 09:46:25 GMT, Emanuel Peter wrote: >> I think it's ok to swap. I collected merged mem info and sorted them by shift value. Then check the memory order. So if shift order follows memory access order (or reverse), they can be merged. > > Ah nice! That means you could add some tests where the order is shuffled, right? I added tests which shuffle the order. Could you check if they are expected? Thanks. >> Now there's limit to merge 2 LoadI as LoadL. >> For byte and short, there's already unsigned load for them in C2, so they can extend safely. But there's no unsigned load for integer, so I stop merging 2 integer load in this patch. I will check if it can be done in other way. > >> Now there's limit to merge 2 LoadI as LoadL. > > Exactly, and that is fine. But you need a test where you merge two ints ;) Tests of merging int are added ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2009548405 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2009549365 From stuefe at openjdk.org Mon Mar 24 07:02:06 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Mar 2025 07:02:06 GMT Subject: RFR: 8352486: [ubsan] compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type const struct unnamed struct In-Reply-To: References: Message-ID: <2h_yjsvQTlASVuWpZNzOr7Yn2968pQQZNaeJd3Z2-mU=.11196f7f-4ba4-42d5-b425-6d827d7a14ac@github.com> On Fri, 21 Mar 2025 14:20:28 GMT, Matthias Baesken wrote: > When running ubsan enabled binaries on macOS aarch, the test serviceability/dcmd/compiler/CompilerMemoryStatisticTest triggers the following warning : > > > /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type 'const struct (unnamed struct at /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:649:3)[64]' > #0 0x108d99cf0 in void MemStatStore::iterate_sorted_filtered(MemStatStore::print_table(outputStream*, bool, unsigned long, int) const::'lambda'(MemStatEntry const*), unsigned long, int, MemStatStore::iteration_result&) const compilationMemoryStatistic.cpp:659 > #1 0x108d97c68 in MemStatStore::print_table(outputStream*, bool, unsigned long, int) const compilationMemoryStatistic.cpp:734 > #2 0x108d9789c in CompilationMemoryStatistic::print_all_by_size(outputStream*, bool, bool, unsigned long, int) compilationMemoryStatistic.cpp:1044 > #3 0x108d97b74 in CompilationMemoryStatistic::print_jcmd_report(outputStream*, bool, bool, unsigned long) compilationMemoryStatistic.cpp:1036 > #4 0x1090cc72c in DCmd::Executor::execute(DCmd*, JavaThread*) diagnosticFramework.cpp:421 > #5 0x108ab6a0c in jcmd(AttachOperation*, attachStream*)::Executor::execute(DCmd*, JavaThread*) attachListener.cpp:391 > #6 0x1090cbf64 in DCmd::Executor::parse_and_execute(char const*, char, JavaThread*) diagnosticFramework.cpp:414 > #7 0x108ab5f98 in jcmd(AttachOperation*, attachStream*) attachListener.cpp:395 > #8 0x108ab2db0 in AttachListenerThread::thread_entry(JavaThread*, JavaThread*) attachListener.cpp:636 > #9 0x1093cc254 in JavaThread::thread_main_inner() javaThread.cpp:776 > #10 0x109c08d68 in Thread::call_run() thread.cpp:231 > #11 0x10992dc44 in thread_native_entry(Thread*) os_bsd.cpp:601 > #12 0x1936fef90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90) > #13 0x1936f9d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30) Good. Thanks @MBaesken ! ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24156#pullrequestreview-2709295488 From duke at openjdk.org Mon Mar 24 07:24:13 2025 From: duke at openjdk.org (duke) Date: Mon, 24 Mar 2025 07:24:13 GMT Subject: RFR: 8352591: Missing UnlockDiagnosticVMOptions in VerifyGraphEdgesWithDeadCodeCheckFromSafepoints test In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 10:38:09 GMT, Marc Chevalier wrote: > Using `StressIGVN` in product requires `UnlockDiagnosticVMOptions`. My bad > > Thanks, > Marc @marc-chevalier Your change (at version cc8fd3247fe7882c6ab200cd586e115046b28cf4) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24151#issuecomment-2747116408 From duke at openjdk.org Mon Mar 24 07:24:12 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 24 Mar 2025 07:24:12 GMT Subject: RFR: 8352591: Missing UnlockDiagnosticVMOptions in VerifyGraphEdgesWithDeadCodeCheckFromSafepoints test In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 10:38:09 GMT, Marc Chevalier wrote: > Using `StressIGVN` in product requires `UnlockDiagnosticVMOptions`. My bad > > Thanks, > Marc Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24151#issuecomment-2747114181 From duke at openjdk.org Mon Mar 24 07:27:12 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 24 Mar 2025 07:27:12 GMT Subject: Integrated: 8352591: Missing UnlockDiagnosticVMOptions in VerifyGraphEdgesWithDeadCodeCheckFromSafepoints test In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 10:38:09 GMT, Marc Chevalier wrote: > Using `StressIGVN` in product requires `UnlockDiagnosticVMOptions`. My bad > > Thanks, > Marc This pull request has now been integrated. Changeset: e23e0f85 Author: Marc Chevalier Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/e23e0f85ef0f959a68adda0cff9e721ba2173ffc Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8352591: Missing UnlockDiagnosticVMOptions in VerifyGraphEdgesWithDeadCodeCheckFromSafepoints test Reviewed-by: kvn, chagedorn, syan ------------- PR: https://git.openjdk.org/jdk/pull/24151 From dfenacci at openjdk.org Mon Mar 24 07:34:13 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Mar 2025 07:34:13 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v16] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 15:51:53 GMT, Marc Chevalier wrote: >> This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. >> >> Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. >> >> This also works for multiplications by powers of 2 since they are already translated into shifts. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Rephrase comment Thanks for reworking on the explanation etc. @marc-chevalier! Still looks ok to me. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/23728#pullrequestreview-2709372803 From swen at openjdk.org Mon Mar 24 07:51:17 2025 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 24 Mar 2025 07:51:17 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v24] In-Reply-To: References: Message-ID: On Sat, 1 Feb 2025 09:17:38 GMT, Quan Anh Mai wrote: >> Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: >> >> add sanity asserts to tests > > Very nice, I think the patch looks good, please do another round of style refinement. In particular, make sure that there is no white space after `(` or before `)`, and after `if` or `for` we prefer having a whitespace before the `(`. @merykitty Could you review the changed code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2747188174 From epeter at openjdk.org Mon Mar 24 07:52:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 07:52:21 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 20:29:41 GMT, Srinivas Vamsi Parasa wrote: >>> > @vamsi-parasa I tried to launch testing, but my script fails because of some merge issue. Would you mind merging from master? >>> >>> Hi Emanuel (@eme64), please see the updated code after the merge with master. >> >> Hi Emanuel (@eme64), could you please let me know if you're still seeing script failure? > >> @vamsi-parasa I launched testing now. Please ping me after the weekend for results :) > > Thank you, Emanuel! @vamsi-parasa Testing looks good / no related test failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2747190368 From epeter at openjdk.org Mon Mar 24 07:55:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 07:55:13 GMT Subject: RFR: 8350609: Cleanup unknown unwind opcode (0xB) for windows In-Reply-To: References: <4BUZBMlLC1zPnsieDNsbkji0xcmHLd_VHRN3bEhpJ3A=.5d2b315c-20de-4770-93e1-846cd733cde0@github.com> Message-ID: On Fri, 21 Mar 2025 18:05:54 GMT, Dhamoder Nalla wrote: >>> @dhanalla As @vivdesh asked above: do you have a regression test for this? >>> >>> You also have this warning above: >>> >>> Warning ?? Found leading lowercase letter in issue title for 8350609: cleanup unknown unwind opcode (0xB) for windows >> I addressed the warning, and regarding regression tests, I have validated the Jtreg Tier 1 tests, including vector-specific tests under /test/jdk/incubator/vector. > >> @dhanalla Thanks for fixing the waring and running some tests! >> >> From my understanding, those tests passed before your patch here, correct? If so, then I'm wondering if there could be a regression test for this "unknown unwind opcode", that fails before your patch and passes with your patch? How feasible is that? > > Thanks @eme64, > We are just cleaning up the unknown unwind codes that are not required. The unwind instructions in the .text section remain untouched. > The only difference we see after this change is that the output of 'dumpbin.exe /unwindinfo jsvml.dll' will no longer display any unknown unwind opcodes identified in the DLL corresponding to these methods. @dhanalla Ok, fair enough. It's tricky to test this explicitly, fair enough. I launched some testing, please ping me in a day for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23707#issuecomment-2747196869 From dfenacci at openjdk.org Mon Mar 24 07:58:09 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Mar 2025 07:58:09 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v6] In-Reply-To: References: Message-ID: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. > > # Testing > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) > > Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/callnode.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21682/files - new: https://git.openjdk.org/jdk/pull/21682/files/9406c6e2..bbaf9859 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21682/head:pull/21682 PR: https://git.openjdk.org/jdk/pull/21682 From dfenacci at openjdk.org Mon Mar 24 08:01:00 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Mar 2025 08:01:00 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v7] In-Reply-To: References: Message-ID: <77cSpEcathieqZqSWq9M8i_Rl5BItnTdV6_9DNdL6k4=.bfac9826-a998-4243-9e70-29a8d42a1a54@github.com> > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. > > # Testing > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) > > Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/callnode.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21682/files - new: https://git.openjdk.org/jdk/pull/21682/files/bbaf9859..cfa5252b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21682/head:pull/21682 PR: https://git.openjdk.org/jdk/pull/21682 From duke at openjdk.org Mon Mar 24 08:02:20 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 24 Mar 2025 08:02:20 GMT Subject: RFR: 8351515: C2 incorrectly removes double negation for double and float [v3] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 13:10:25 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> A fuzzer run discovered that code of the form `0 - (0 - x)` yields the wrong result for `x = -0.0`. According to the [Java Language Spec 15.8.2](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.18.2) the sum of floating point zeroes of opposite sign is positive zero. Hence, `(0 - (0 - (-0.0))` must result in `+0.0`, but due to the folding of all double negations in `SubNode::Identity` this was optimized to `-0.0` >> >> # Changeset overview >> >> To fix this issue, I excluded floating point numbers from the folding of double negations, which also includes `Float16` values. This might seem excessive at first glance, but we do not track range information for floating point types. Hence, we could still perform the folding of the double negation if a floating point constant is not `-0.0`. However, constant folding already takes care of this. >> >> Changes: >> - IR-Framework: fix `IRNode.SUB`not matching `SubHFNode` >> - Add a regression IR-test >> - Exclude floating point `SubNodes` from folding double negations >> >> # Testing >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/13989207348) >> - `tier1` through `tier5` plus Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment wording > > Co-authored-by: Tobias Hartmann Thanks for the review and helpful comments everyone! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24150#issuecomment-2747211369 From duke at openjdk.org Mon Mar 24 08:02:21 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 24 Mar 2025 08:02:21 GMT Subject: Integrated: 8351515: C2 incorrectly removes double negation for double and float In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 10:20:53 GMT, Manuel H?ssig wrote: > # Issue Summary > > A fuzzer run discovered that code of the form `0 - (0 - x)` yields the wrong result for `x = -0.0`. According to the [Java Language Spec 15.8.2](https://docs.oracle.com/javase/specs/jls/se23/html/jls-15.html#jls-15.18.2) the sum of floating point zeroes of opposite sign is positive zero. Hence, `(0 - (0 - (-0.0))` must result in `+0.0`, but due to the folding of all double negations in `SubNode::Identity` this was optimized to `-0.0` > > # Changeset overview > > To fix this issue, I excluded floating point numbers from the folding of double negations, which also includes `Float16` values. This might seem excessive at first glance, but we do not track range information for floating point types. Hence, we could still perform the folding of the double negation if a floating point constant is not `-0.0`. However, constant folding already takes care of this. > > Changes: > - IR-Framework: fix `IRNode.SUB`not matching `SubHFNode` > - Add a regression IR-test > - Exclude floating point `SubNodes` from folding double negations > > # Testing > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/13989207348) > - `tier1` through `tier5` plus Oracle internal testing This pull request has now been integrated. Changeset: 5591f8a4 Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/5591f8a42997c7bbe99d26f7a75d494a53e436fa Stats: 90 lines in 3 files changed: 85 ins; 0 del; 5 mod 8351515: C2 incorrectly removes double negation for double and float Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24150 From mbaesken at openjdk.org Mon Mar 24 08:10:20 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 24 Mar 2025 08:10:20 GMT Subject: Integrated: 8352486: [ubsan] compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type const struct unnamed struct In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 14:20:28 GMT, Matthias Baesken wrote: > When running ubsan enabled binaries on macOS aarch, the test serviceability/dcmd/compiler/CompilerMemoryStatisticTest triggers the following warning : > > > /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type 'const struct (unnamed struct at /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:649:3)[64]' > #0 0x108d99cf0 in void MemStatStore::iterate_sorted_filtered(MemStatStore::print_table(outputStream*, bool, unsigned long, int) const::'lambda'(MemStatEntry const*), unsigned long, int, MemStatStore::iteration_result&) const compilationMemoryStatistic.cpp:659 > #1 0x108d97c68 in MemStatStore::print_table(outputStream*, bool, unsigned long, int) const compilationMemoryStatistic.cpp:734 > #2 0x108d9789c in CompilationMemoryStatistic::print_all_by_size(outputStream*, bool, bool, unsigned long, int) compilationMemoryStatistic.cpp:1044 > #3 0x108d97b74 in CompilationMemoryStatistic::print_jcmd_report(outputStream*, bool, bool, unsigned long) compilationMemoryStatistic.cpp:1036 > #4 0x1090cc72c in DCmd::Executor::execute(DCmd*, JavaThread*) diagnosticFramework.cpp:421 > #5 0x108ab6a0c in jcmd(AttachOperation*, attachStream*)::Executor::execute(DCmd*, JavaThread*) attachListener.cpp:391 > #6 0x1090cbf64 in DCmd::Executor::parse_and_execute(char const*, char, JavaThread*) diagnosticFramework.cpp:414 > #7 0x108ab5f98 in jcmd(AttachOperation*, attachStream*) attachListener.cpp:395 > #8 0x108ab2db0 in AttachListenerThread::thread_entry(JavaThread*, JavaThread*) attachListener.cpp:636 > #9 0x1093cc254 in JavaThread::thread_main_inner() javaThread.cpp:776 > #10 0x109c08d68 in Thread::call_run() thread.cpp:231 > #11 0x10992dc44 in thread_native_entry(Thread*) os_bsd.cpp:601 > #12 0x1936fef90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90) > #13 0x1936f9d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30) This pull request has now been integrated. Changeset: a8757332 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/a8757332667df3fe41a29a7eedb2a7234d23c2a0 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8352486: [ubsan] compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type const struct unnamed struct Reviewed-by: kvn, mdoerr, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/24156 From mbaesken at openjdk.org Mon Mar 24 08:10:20 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 24 Mar 2025 08:10:20 GMT Subject: RFR: 8352486: [ubsan] compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type const struct unnamed struct In-Reply-To: References: Message-ID: <0Agp9Nv36vKNDhgVcxte88EEg0b9a9MBggl0s2oWHLU=.32a7e8d5-b6e9-4a8f-99b6-ae27ff839701@github.com> On Fri, 21 Mar 2025 14:20:28 GMT, Matthias Baesken wrote: > When running ubsan enabled binaries on macOS aarch, the test serviceability/dcmd/compiler/CompilerMemoryStatisticTest triggers the following warning : > > > /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:659:21: runtime error: index 64 out of bounds for type 'const struct (unnamed struct at /priv/jenkins/client-home/workspace/openjdk-jdk-weekly-macos_aarch64-opt/jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:649:3)[64]' > #0 0x108d99cf0 in void MemStatStore::iterate_sorted_filtered(MemStatStore::print_table(outputStream*, bool, unsigned long, int) const::'lambda'(MemStatEntry const*), unsigned long, int, MemStatStore::iteration_result&) const compilationMemoryStatistic.cpp:659 > #1 0x108d97c68 in MemStatStore::print_table(outputStream*, bool, unsigned long, int) const compilationMemoryStatistic.cpp:734 > #2 0x108d9789c in CompilationMemoryStatistic::print_all_by_size(outputStream*, bool, bool, unsigned long, int) compilationMemoryStatistic.cpp:1044 > #3 0x108d97b74 in CompilationMemoryStatistic::print_jcmd_report(outputStream*, bool, bool, unsigned long) compilationMemoryStatistic.cpp:1036 > #4 0x1090cc72c in DCmd::Executor::execute(DCmd*, JavaThread*) diagnosticFramework.cpp:421 > #5 0x108ab6a0c in jcmd(AttachOperation*, attachStream*)::Executor::execute(DCmd*, JavaThread*) attachListener.cpp:391 > #6 0x1090cbf64 in DCmd::Executor::parse_and_execute(char const*, char, JavaThread*) diagnosticFramework.cpp:414 > #7 0x108ab5f98 in jcmd(AttachOperation*, attachStream*) attachListener.cpp:395 > #8 0x108ab2db0 in AttachListenerThread::thread_entry(JavaThread*, JavaThread*) attachListener.cpp:636 > #9 0x1093cc254 in JavaThread::thread_main_inner() javaThread.cpp:776 > #10 0x109c08d68 in Thread::call_run() thread.cpp:231 > #11 0x10992dc44 in thread_native_entry(Thread*) os_bsd.cpp:601 > #12 0x1936fef90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90) > #13 0x1936f9d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30) Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24156#issuecomment-2747232492 From rcastanedalo at openjdk.org Mon Mar 24 08:30:12 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 24 Mar 2025 08:30:12 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> <7Fzrb4C-4VyJlOMUaaFqhTzlj4o7dVXS8-EkLCiVVA4=.367fe44f-540c-49cc-aa1c-3ab381febd32@github.com> Message-ID: On Fri, 21 Mar 2025 15:15:00 GMT, Emanuel Peter wrote: >>> What would be a better name though? >> >> @merykitty had the suggestions `MemNode::value_type()` or `MemNode::value_basic_type()` (see comment [above](https://github.com/openjdk/jdk/pull/24005#issuecomment-2724445592)), I like both better than the current name. > >> > What would be a better name though? >> >> @merykitty had the suggestions `MemNode::value_type()` or `MemNode::value_basic_type()` (see comment [above](https://github.com/openjdk/jdk/pull/24005#issuecomment-2724445592)), I like both better than the current name. > > @merykitty @robcasloz `MemNode::value_basic_type()` sounds like the most descriptive and accurate. Great! @robcasloz , will you file an RFE for that? Hi @eme64, I implemented your test suggestions (commit b59d2eb2), please re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2747277559 From rcastanedalo at openjdk.org Mon Mar 24 08:30:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 24 Mar 2025 08:30:11 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v4] In-Reply-To: References: Message-ID: > The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: > > > for (int i = 0; i < array.length; i++) { > array[i] = 0; > } > > with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: > > > arrayof_jint_fill(array, 0, array.length) > > > The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). > > This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations. > > Assuming mismatched stores are discarded (as proposed here), an alternative solution would be to define a StoreS node returning the appropriate `memory_type()`. This could be desirable even as a complement to this fix, to prevent similar bugs in the future. I propose to investigate the introduction of a StoreS node in a separate RFE, because it is a much larger and more intrusive changeset, and go with this minimal, local, and non-intrusive fix for backportability. > > **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Relax test cases ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24005/files - new: https://git.openjdk.org/jdk/pull/24005/files/38c9b475..b59d2eb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24005&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24005&range=02-03 Stats: 13 lines in 2 files changed: 10 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24005/head:pull/24005 PR: https://git.openjdk.org/jdk/pull/24005 From duke at openjdk.org Mon Mar 24 08:32:21 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 24 Mar 2025 08:32:21 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:55:11 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/memnode.cpp line 1979: >> >>> 1977: return l->unique_out(); >>> 1978: } else { >>> 1979: return (Node*)l; >> >> Hmm, I don't like casting away `const`... is there a way to avoid this? > > Could the output pointer be `const`? Yes, it's changed to `const` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2009698545 From rehn at openjdk.org Mon Mar 24 08:36:13 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 24 Mar 2025 08:36:13 GMT Subject: RFR: 8352615: [Test] RISC-V: TestVectorizationMultiInvar.java fails on riscv64 without rvv support In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 14:59:21 GMT, Hamlin Li wrote: > Hi, > Can you help to review this trivial patch? > TestVectorizationMultiInvar.java fails on riscv if rvv is not support, as it will verify the `MaxVectorSize > 0` in test framework. > > Thanks! Thanks ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24157#pullrequestreview-2709551042 From rehn at openjdk.org Mon Mar 24 08:45:11 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 24 Mar 2025 08:45:11 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: References: Message-ID: <6Vx1NT54rOWXHIAj1Ug2Ike3A7bTmIYSOGzp09GA0pg=.1ccd1d10-fed8-4194-942d-a25d7c39b68b@github.com> On Mon, 24 Mar 2025 04:55:12 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this patch? >> We can let min/max to use cmove if Zicond is supported rather than a branch. >> At this same time, this patch also simplify the code of min/max. >> >> Thanks! > > src/hotspot/cpu/riscv/riscv.ad line 9073: > >> 9071: ins_encode %{ >> 9072: __ cmov_gt(as_Register($dst$$reg), as_Register($src$$reg), >> 9073: as_Register($dst$$reg), as_Register($src$$reg)); > > But the `ins_cost` isn't updated to reflect this change? It is still `BRANCH_COST + ALU_COST` which will only reflect the branch code. Seems better to create seperate match rules for these cmove cases with `UseZicond` as the predicate and proper costs. We already have two sets of match rules for Max and Min. The other one which should be the most efficient is in file `riscv_b.ad` [1]. Do we have a hardware which implements `Zicond` but doesn't have `Zbb`? > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_b.ad#L436 Yes, better to have seperate match rules. But I also agree that it's a bit messy and unclear how many will actually use that version. And all of these versions increase test cost. I think we should aim for RVA23 as the well supported and tested set of extensions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24153#discussion_r2009716924 From thartmann at openjdk.org Mon Mar 24 08:50:24 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 24 Mar 2025 08:50:24 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v5] In-Reply-To: <3TUaCIHwDAl8dK1hATRK8m5XZIK1oeY8231x1HaLl3s=.ac07bdce-7807-461f-8d5a-906d50d1c411@github.com> References: <4gi1QLJRikwQR2ShA9zy_cOK4NDsjrJK4ZyyuzuNLjc=.924da387-362c-4a39-b4da-0d347c72d354@github.com> <3TUaCIHwDAl8dK1hATRK8m5XZIK1oeY8231x1HaLl3s=.ac07bdce-7807-461f-8d5a-906d50d1c411@github.com> Message-ID: <09Nnwhd1-0I_uMC6zFEvJYny8Ysf008kxUV-hmA0mc0=.4070a496-c18b-4415-bb26-9b3d81923f87@github.com> On Fri, 21 Mar 2025 22:47:30 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/compile.cpp line 2050: >> >>> 2048: assert(is_scheduled_for_igvn_before == is_scheduled_for_igvn_after, "call node removed from IGVN list during inlining pass"); >>> 2049: cg->call_node()->set_generator(cg); >>> 2050: } >> >> I find this a bit hard to read. Wouldn't it be semantically equivalent to this? >> >> >> if (is_scheduled_for_igvn_before == is_scheduled_for_igvn_after) { >> cg->call_node()->set_generator(cg); >> } else { >> assert(false, "Some useful message"); >> } >> >> >> We wouldn't have separate asserts for the two cases, but I think that's fine since one can easily figure it out from the boolean values. > > The difference is whether a call can be scheduled for a repeated inlining attempt in the future. > > `cg->call_node()->set_generator(cg)` reinitializes `cg` in `CallNode` and lets IGVN to submit it for incremental inlining during future passes. > > The first check guards against a situation when the call node is already on IGVN list (so, it will be automatically rescheduled for inlining during the next IGVN pass causing an infinite loop in incremental inlining). > > The second assert catches a suspicious situation when the call node disappears from IGVN worklist during failed inlining attempt. IMO it should not happens, hence the assert. But it is benign to allow repeated inlining in such case. Okay, thanks for the background. If we keep both asserts, some comments explaining this would be helpful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r2009723895 From chagedorn at openjdk.org Mon Mar 24 09:08:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Mar 2025 09:08:12 GMT Subject: RFR: 8211759: C2: Graph after optimizations should not have dead nodes In-Reply-To: References: Message-ID: On Sun, 23 Mar 2025 08:23:43 GMT, Zihao Lin wrote: > Move the check_no_dead_use() call after the final_graph_reshaping() call to catch dead nodes. When reading the description and the linked review thread in the JBS issue, it sounds like there are possibly cases where dead nodes could still be here after `final_graph_reshaping()` and this should be addressed with this JBS issue. Your patch now suggests that there are no (more?) such cases and the "no dead node" verification can simply be moved after `final_graph_reshaping()`. Can you elaborate more on how you concluded that and/or what kind of testing you did to have enough confidence that this is indeed the case? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24175#issuecomment-2747377775 From mli at openjdk.org Mon Mar 24 09:11:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 24 Mar 2025 09:11:06 GMT Subject: RFR: 8352615: [Test] RISC-V: TestVectorizationMultiInvar.java fails on riscv64 without rvv support In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 02:37:19 GMT, Fei Yang wrote: > Looks good. Seems this won't menifest on riscv64 platforms where `AlignVector` is true. Thank you! Yes, there is a `!((Boolean)alignVector)` condition to really run the test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24157#issuecomment-2747386324 From mli at openjdk.org Mon Mar 24 09:11:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 24 Mar 2025 09:11:06 GMT Subject: RFR: 8352615: [Test] RISC-V: TestVectorizationMultiInvar.java fails on riscv64 without rvv support In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 08:33:51 GMT, Robbin Ehn wrote: > Thanks Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24157#issuecomment-2747386776 From mli at openjdk.org Mon Mar 24 09:27:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 24 Mar 2025 09:27:20 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: <6Vx1NT54rOWXHIAj1Ug2Ike3A7bTmIYSOGzp09GA0pg=.1ccd1d10-fed8-4194-942d-a25d7c39b68b@github.com> References: <6Vx1NT54rOWXHIAj1Ug2Ike3A7bTmIYSOGzp09GA0pg=.1ccd1d10-fed8-4194-942d-a25d7c39b68b@github.com> Message-ID: On Mon, 24 Mar 2025 08:42:34 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/riscv.ad line 9073: >> >>> 9071: ins_encode %{ >>> 9072: __ cmov_gt(as_Register($dst$$reg), as_Register($src$$reg), >>> 9073: as_Register($dst$$reg), as_Register($src$$reg)); >> >> But the `ins_cost` isn't updated to reflect this change? It is still `BRANCH_COST + ALU_COST` which will only reflect the branch code. Seems better to create seperate match rules for these cmove cases with `UseZicond` as the predicate and proper costs. We already have two sets of match rules for Max and Min. The other one which should be the most efficient is in file `riscv_b.ad` [1]. Do we have a hardware which implements `Zicond` but doesn't have `Zbb`? >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_b.ad#L436 > > Yes, better to have seperate match rules. > > But I also agree that it's a bit messy and unclear how many will actually use that version. > And all of these versions increase test cost. > I think we should aim for RVA23 as the well supported and tested set of extensions. For performance consideration please check existing `CMoveI`, which calls `enc_cmove` which calls `cmov_xx`. So if `CMoveI` brings benefit when `UseZicond == true` over `UseZicond != true`, this refactoring should also works expected, as they use the same code. As for the question, whether a hardware will support `zicond` but not `zbb`, I have no answer. But, anyway in fact you can just consider this as a code cleanup, in this sense seems it should be good? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24153#discussion_r2009784521 From mli at openjdk.org Mon Mar 24 09:30:11 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 24 Mar 2025 09:30:11 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: References: <6Vx1NT54rOWXHIAj1Ug2Ike3A7bTmIYSOGzp09GA0pg=.1ccd1d10-fed8-4194-942d-a25d7c39b68b@github.com> Message-ID: On Mon, 24 Mar 2025 09:24:28 GMT, Hamlin Li wrote: >> Yes, better to have seperate match rules. >> >> But I also agree that it's a bit messy and unclear how many will actually use that version. >> And all of these versions increase test cost. >> I think we should aim for RVA23 as the well supported and tested set of extensions. > > For performance consideration please check existing `CMoveI`, which calls `enc_cmove` which calls `cmov_xx`. So if `CMoveI` brings benefit when `UseZicond == true` over `UseZicond != true`, this refactoring should also works expected, as they use the same code. > > As for the question, whether a hardware will support `zicond` but not `zbb`, I have no answer. > > But, anyway in fact you can just consider this as a code cleanup, in this sense seems it should be good? > better to have seperate match rules. We could also do the similar thing to `CMoveX`, for this part I can do it in a separate PR together if this one is accecpted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24153#discussion_r2009789893 From chagedorn at openjdk.org Mon Mar 24 09:32:19 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Mar 2025 09:32:19 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: <2OJ3-DbdBBiHjZKVDGNXb-IFLqXi0jI4Ezz_zs6NsMA=.a904b56a-a3c4-4f06-a708-70223beccb60@github.com> Message-ID: On Fri, 21 Mar 2025 16:25:14 GMT, Roland Westrelin wrote: > > But also a problem, indeed. I just think that going into the future, we should still make a reasonable effort to try and let the control path die sanely without needing this patch. It should only serve as a last resort to avoid breaking the graph. While I think it's the safest solution, my concern is that we will not find inefficiencies anymore with this patch. For example, if someone breaks Assertion Predicates, how can we detect this when the graph will always be sane? It's especially tricky now that I'm still adding Assertion Predicate patches and things might break during development and it goes unnoticed. But maybe I just need to turn this patch off locally. > > I agree with that. So ideally the code for this patch should only execute for those cases not properly handled some other way. I tried to figure out a way to do that but concluded it was not really possible. One thing could be to have a flag on `Compile` that's only set to true once a dangerous transformation is performed (in the case of this test case, some transformation involving cast nodes that widens the type at some point in the graph). The new logic would only execute when that flag is true. Do you think it's worth trying or would the logic still run too often to catch bugs elsewhere? I guess it could be worth if these dangerous transformations are not executed too often or if we can efficiently have checks to only apply the patch to these problematic cases. But I see that it's probably quite difficult to detect those and we may end up running with the patch enabled in a lot of cases to be on the safe side and shadowing issues we could address and fix easily. So, I'm not sure how much effort we should put into that or if we should just have a global flag to disable this patch (enabled by default) and have some stress job that runs with the patch disabled. There is a risk of having false positive reports with that. But given how rarely we see a broken graph today traced back to these unsolvable cases, it might be justified. And we could still remove the flag again if we see many reports with this flag disabled. The story might be different with JDK-8275202 in and we could get a lot more reports with the flag disabled. But I assume you have a special flag guarding the new optimizations of 8275202. We could then just enforce the flag to enable this patch to be true when the optimizations of 8275202 are enabled with the dedicated flag. > I think we want to leave the `Halt` nodes in the final code. Investigating crashes when compile code executes is somewhat trickier than crashes when compiling and that's a drawback of this patch. If the `Halt` nodes are removed then, in case of a bug where a path that's expected unreachable is taken, execution could proceed and fail only much later. That would lead to much harder and mysterious bugs. > > Also it's not guaranteed that there's a an `If` right before the `Cast` or that that `If` is actually the condition guarding the `Cast`. Sounds like it's safer to just sanely crash with a `halt` instead of executing unreachable paths by mistake which could have an unpredictable behavior. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2747446979 From rehn at openjdk.org Mon Mar 24 09:36:05 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 24 Mar 2025 09:36:05 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: References: <6Vx1NT54rOWXHIAj1Ug2Ike3A7bTmIYSOGzp09GA0pg=.1ccd1d10-fed8-4194-942d-a25d7c39b68b@github.com> Message-ID: <6shrH4rG_Ne1r7q99ptFITsnkZpo89ShLiXfOTkcfG0=.186f153b-7539-48d5-bb21-dce816f7f9f5@github.com> On Mon, 24 Mar 2025 09:27:45 GMT, Hamlin Li wrote: >> For performance consideration please check existing `CMoveI`, which calls `enc_cmove` which calls `cmov_xx`. So if `CMoveI` brings benefit when `UseZicond == true` over `UseZicond != true`, this refactoring should also works expected, as they use the same code. >> >> As for the question, whether a hardware will support `zicond` but not `zbb`, I have no answer. >> >> But, anyway in fact you can just consider this as a code cleanup, in this sense seems it should be good? > >> better to have seperate match rules. > > We could also do the similar thing to `CMoveX`, for this part I can do it in a separate PR together if this one is accecpted. Yes, good point, I think we need to update the cost for some others as well. We can do that separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24153#discussion_r2009799536 From epeter at openjdk.org Mon Mar 24 09:37:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 09:37:14 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v4] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 08:30:11 GMT, Roberto Casta?eda Lozano wrote: >> The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: >> >> >> for (int i = 0; i < array.length; i++) { >> array[i] = 0; >> } >> >> with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: >> >> >> arrayof_jint_fill(array, 0, array.length) >> >> >> The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). >> >> This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations. >> >> Assuming mismatched stores are discarded (as proposed here), an alternative solution would be to define a StoreS node returning the appropriate `memory_type()`. This could be desirable even as a complement to this fix, to prevent similar bugs in the future. I propose to investigate the introduction of a StoreS node in a separate RFE, because it is a much larger and more intrusive changeset, and go with this minimal, local, and non-intrusive fix for backportability. >> >> **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Relax test cases @robcasloz Thanks for the changes! Some thoughts about future work on intrinsic fill. - It would be nice to enable mismatched cases. - And it would be nice to enable not just arrays, but also native memory. That would be especially good for MemorySegments. But not sure how easy this change would be. src/hotspot/share/opto/loopTransform.cpp line 3579: > 3577: if (msg == nullptr && store->as_Mem()->is_mismatched_access()) { > 3578: msg = "mismatched store"; > 3579: } What effect does this have? Ah, it seems to have to do with these comments in your PR: `Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations.` It may be good to leave additional comments in the code here, saying that this is a limitation, and maybe improved in the future. Up to you. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24005#pullrequestreview-2709715800 PR Review Comment: https://git.openjdk.org/jdk/pull/24005#discussion_r2009796837 From duke at openjdk.org Mon Mar 24 09:42:28 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 24 Mar 2025 09:42:28 GMT Subject: RFR: 8352595: Regression of JDK-8314999 in IR matching Message-ID: A lot of tests for the IR framework used `ALLOC` and friends as a check that would run on the Opto assembly by default, but can also run earlier, but that's no longer the case. There were two kinds of tests to fix: the ones rather about `ALLOC`, where the used or expected compile phases have to change, and the tests where `ALLOC` were just a check that would run on opto assembly. For this, I tried to keep the spirit of the test using other regexes made for this stage. ------------- Commit messages: - Fix TestCompilePhaseCollector.java - Fix TestPhaseIRMatching.java - Fix TestBadFormat Changes: https://git.openjdk.org/jdk/pull/24163/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24163&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352595 Stats: 74 lines in 3 files changed: 16 ins; 2 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/24163.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24163/head:pull/24163 PR: https://git.openjdk.org/jdk/pull/24163 From chagedorn at openjdk.org Mon Mar 24 09:42:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Mar 2025 09:42:29 GMT Subject: RFR: 8352595: Regression of JDK-8314999 in IR matching In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 15:41:23 GMT, Marc Chevalier wrote: > A lot of tests for the IR framework used `ALLOC` and friends as a check that would run on the Opto assembly by default, but can also run earlier, but that's no longer the case. > > There were two kinds of tests to fix: the ones rather about `ALLOC`, where the used or expected compile phases have to change, and the tests where `ALLOC` were just a check that would run on opto assembly. For this, I tried to keep the spirit of the test using other regexes made for this stage. Thanks for addressing this. I have some comments. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java line 192: > 190: @IR(counts = {IRNode.ALLOC, "2", IRNode.ALLOC_OF, "Object", "1"}) > 191: @ExpectedFailure(ruleId = 5, phase = CompilePhase.BEFORE_MACRO_EXPANSION, counts = 1) > 192: public void defaultOnBoth() { I think for this test you should add a matching on some `PrintOptoAssembly` as well since it tries to verify ideal and opto matching. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/flag/TestCompilePhaseCollector.java line 116: > 114: OPTIMIZE_FINISHED, PRINT_IDEAL); > 115: assertContainsOnly(methodToCompilePhases, testClass, "mix8", PHASEIDEALLOOP1, PHASEIDEALLOOP2, FINAL_CODE, > 116: OPTIMIZE_FINISHED, PRINT_IDEAL); Note that the tests on this file only collect compile phases and actually do not perform any IR matching. So, the IR rules do not need work. This means that for all tests where we use the default compile phase of `ALLOC` (which is `PRINT_OPTO_ASSEMBLY`), we can replace `ALLOC` with just something else that defaults on `PRINT_OPTO_ASSEMBLY`. This just means that we should not replace `PRINT_OPTO_ASSEMBLY` with `BEFORE_MACRO_EXPANSION` here but instead use a different `IRNode` in the IR rules at these tests that default on `PRINT_OPTO_ASSEMBLY`. I've made some comments further down where this applies. The other changes in the file look fine. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/flag/TestCompilePhaseCollector.java line 222: > 220: @Test > 221: @IR(failOn = {IRNode.CBNZW_HI}) > 222: @IR(counts = {IRNode.CBZW_LS, "> 1"}) I've just noticed that these AArch64 specific `IRNode` entries do not have a check that they are only supported on AArch64. In the past, I've added some platform checks for `IRNode` entries only being supported on certain platforms: https://github.com/openjdk/jdk/blob/e23e0f85ef0f959a68adda0cff9e721ba2173ffc/test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java#L2905-L2925 We should probably add a similar check for these `CB*` entries. But that's something that can be done separately. For this test here, we only collect compile phases and it does not matter on what platforms these `IRNode` entries are eventually used. So, no action to be done in your changes here. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/flag/TestCompilePhaseCollector.java line 417: > 415: @Test > 416: @IR(failOn = IRNode.STORE, phase = {PHASEIDEALLOOP1, DEFAULT, PHASEIDEALLOOP2}) > 417: @IR(counts = {IRNode.LOOP, "3"}, phase = {FINAL_CODE, OPTIMIZE_FINISHED, DEFAULT}) Suggestion: @IR(counts = {IRNode.FIELD_ACCESS, "3"}, phase = {FINAL_CODE, OPTIMIZE_FINISHED, DEFAULT}) test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/flag/TestCompilePhaseCollector.java line 430: > 428: @Test > 429: @IR(failOn = IRNode.ALLOC, phase = {PHASEIDEALLOOP1, PHASEIDEALLOOP2}) > 430: @IR(counts = {IRNode.ALLOC, "3"}, phase = {FINAL_CODE, OPTIMIZE_FINISHED, DEFAULT}) Suggestion: @IR(failOn = IRNode.FIELD_ACCESS, phase = {PHASEIDEALLOOP1, PRINT_OPTO_ASSEMBLY, PHASEIDEALLOOP2}) @IR(counts = {IRNode.FIELD_ACCESS, "3"}, phase = {FINAL_CODE, OPTIMIZE_FINISHED, DEFAULT}) test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/flag/TestCompilePhaseCollector.java line 435: > 433: @Test > 434: @IR(failOn = IRNode.STORE, phase = {PHASEIDEALLOOP1, PRINT_IDEAL, PHASEIDEALLOOP2}) > 435: @IR(counts = {IRNode.LOOP, "3"}, phase = {FINAL_CODE, OPTIMIZE_FINISHED, DEFAULT}) Suggestion: @IR(counts = {IRNode.FIELD_ACCESS, "3"}, phase = {FINAL_CODE, OPTIMIZE_FINISHED, DEFAULT}) test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/flag/TestCompilePhaseCollector.java line 460: > 458: @Test > 459: @IR(counts = {"foo", "3"}, phase = {PHASEIDEALLOOP1, PHASEIDEALLOOP2}) > 460: @IR(failOn = IRNode.LOOP, phase = {FINAL_CODE, OPTIMIZE_FINISHED, DEFAULT}) Suggestion: @IR(failOn = IRNode.FIELD_ACCESS, phase = {FINAL_CODE, OPTIMIZE_FINISHED, DEFAULT}) ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24163#pullrequestreview-2709437094 PR Review Comment: https://git.openjdk.org/jdk/pull/24163#discussion_r2009642233 PR Review Comment: https://git.openjdk.org/jdk/pull/24163#discussion_r2009705679 PR Review Comment: https://git.openjdk.org/jdk/pull/24163#discussion_r2009659733 PR Review Comment: https://git.openjdk.org/jdk/pull/24163#discussion_r2009722404 PR Review Comment: https://git.openjdk.org/jdk/pull/24163#discussion_r2009723100 PR Review Comment: https://git.openjdk.org/jdk/pull/24163#discussion_r2009723308 PR Review Comment: https://git.openjdk.org/jdk/pull/24163#discussion_r2009723545 From dfenacci at openjdk.org Mon Mar 24 10:02:16 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Mar 2025 10:02:16 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v5] In-Reply-To: <4gi1QLJRikwQR2ShA9zy_cOK4NDsjrJK4ZyyuzuNLjc=.924da387-362c-4a39-b4da-0d347c72d354@github.com> References: <4gi1QLJRikwQR2ShA9zy_cOK4NDsjrJK4ZyyuzuNLjc=.924da387-362c-4a39-b4da-0d347c72d354@github.com> Message-ID: On Thu, 20 Mar 2025 12:49:51 GMT, Tobias Hartmann wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - JDK-8302459: refactor helper method >> - JDK-8302459: reshape infinite loop check > > src/hotspot/share/opto/callnode.cpp line 1117: > >> 1115: if (phase->C->print_inlining()) { >> 1116: phase->C->inline_printer()->record(cg->method(), cg->call_node()->jvms(), InliningResult::FAILURE, >> 1117: "static call node changed: trying again"); > > FTR, could you share how the PrintInlining output looks now when this code is triggered? It looks like this: @ 192 jdk.internal.vm.vector.VectorSupport::binaryOp (38 bytes) failed to inline: failed to inline (intrinsic) failed to inline: static call node changed: trying again (intrinsic) late inline succeeded It seems a bit redundant: the first `failed to inline: failed to inline (intrinsic)` doesn't seem to be needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r2009841029 From rcastanedalo at openjdk.org Mon Mar 24 10:09:37 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 24 Mar 2025 10:09:37 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v5] In-Reply-To: References: Message-ID: > The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: > > > for (int i = 0; i < array.length; i++) { > array[i] = 0; > } > > with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: > > > arrayof_jint_fill(array, 0, array.length) > > > The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). > > This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations. > > Assuming mismatched stores are discarded (as proposed here), an alternative solution would be to define a StoreS node returning the appropriate `memory_type()`. This could be desirable even as a complement to this fix, to prevent similar bugs in the future. I propose to investigate the introduction of a StoreS node in a separate RFE, because it is a much larger and more intrusive changeset, and go with this minimal, local, and non-intrusive fix for backportability. > > **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Add comment about mismatched store handling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24005/files - new: https://git.openjdk.org/jdk/pull/24005/files/b59d2eb2..c0b3cf96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24005&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24005&range=03-04 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24005/head:pull/24005 PR: https://git.openjdk.org/jdk/pull/24005 From rcastanedalo at openjdk.org Mon Mar 24 10:09:37 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 24 Mar 2025 10:09:37 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v4] In-Reply-To: References: Message-ID: <3Dh1Yg0riju476nUmXdKJFZIdq8ALRt8eDfqA7vCluU=.4650cb56-23d0-4dbc-a6fb-ecdfd03778b5@github.com> On Mon, 24 Mar 2025 09:31:51 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Relax test cases > > src/hotspot/share/opto/loopTransform.cpp line 3579: > >> 3577: if (msg == nullptr && store->as_Mem()->is_mismatched_access()) { >> 3578: msg = "mismatched store"; >> 3579: } > > What effect does this have? > > Ah, it seems to have to do with these comments in your PR: > `Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations.` > > It may be good to leave additional comments in the code here, saying that this is a limitation, and maybe improved in the future. Up to you. I agree, done in commit c0b3cf96. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24005#discussion_r2009857132 From epeter at openjdk.org Mon Mar 24 10:21:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 10:21:28 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> Message-ID: On Mon, 24 Mar 2025 06:49:31 GMT, kuaiwei wrote: >> @kuaiwei Just ping me when you would like me to re-review :) > > Hi, @eme64 , I changed as comments and merge with master branch. It look no issue in testing. Could you help check it again? Thanks. @kuaiwei I have not yet had the time to read through the PR, but I would like to talk about `LoadNode::Ideal`. The idea with `Ideal` in general, is that you replace one node with another. After `Ideal` returns, all usages of the old node now take the new node instead. You copied the structure from my MergeStores implementation in `StoreNode::Idea`. There it made sense to replace `StoreB` nodes that have a memory output with `LoadI` nodes, which also have memory output. But it does not make sense to replace a `LoadB` that has a byte/int output with a `LoadL` that has a long output for example. I think your implementation should go into `OrINode`, and match the expression up from there. Because we want to replace the old `OrI` with the new `LoadL`. ---------- Another question: Do you have some tests where some of the nodes in the `load/shift/or` expression have other uses? Imagine this: l0 = a[0]; l1 = a[1]; l2 = a[2]; l3 = a[3]; l = ; now use l1 for something else as well What happens now? Do you check that we only use the old `LoadB` in the expression we are replacing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2747597073 From dfenacci at openjdk.org Mon Mar 24 10:22:19 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Mar 2025 10:22:19 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v5] In-Reply-To: References: <4gi1QLJRikwQR2ShA9zy_cOK4NDsjrJK4ZyyuzuNLjc=.924da387-362c-4a39-b4da-0d347c72d354@github.com> Message-ID: On Mon, 24 Mar 2025 09:57:28 GMT, Damon Fenacci wrote: >> src/hotspot/share/opto/callnode.cpp line 1117: >> >>> 1115: if (phase->C->print_inlining()) { >>> 1116: phase->C->inline_printer()->record(cg->method(), cg->call_node()->jvms(), InliningResult::FAILURE, >>> 1117: "static call node changed: trying again"); >> >> FTR, could you share how the PrintInlining output looks now when this code is triggered? > > It looks like this: > > @ 192 jdk.internal.vm.vector.VectorSupport::binaryOp (38 bytes) failed to inline: failed to inline (intrinsic) failed to inline: static call node changed: trying again (intrinsic) late inline succeeded > > > It seems a bit redundant: the first `failed to inline: failed to inline (intrinsic)` doesn't seem to be needed. Actually it is a bit verbose but I would probably leave it like this: the first (`failed to inline: failed to inline (intrinsic)`) is for the failure and the second (`failed to inline: static call node changed: trying again (intrinsic)`) is for the retry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r2009878651 From epeter at openjdk.org Mon Mar 24 10:24:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 10:24:19 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call [v5] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 10:09:37 GMT, Roberto Casta?eda Lozano wrote: >> The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: >> >> >> for (int i = 0; i < array.length; i++) { >> array[i] = 0; >> } >> >> with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: >> >> >> arrayof_jint_fill(array, 0, array.length) >> >> >> The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). >> >> This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations. >> >> Assuming mismatched stores are discarded (as proposed here), an alternative solution would be to define a StoreS node returning the appropriate `memory_type()`. This could be desirable even as a complement to this fix, to prevent similar bugs in the future. I propose to investigate the introduction of a StoreS node in a separate RFE, because it is a much larger and more intrusive changeset, and go with this minimal, local, and non-intrusive fix for backportability. >> >> **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Add comment about mismatched store handling Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24005#pullrequestreview-2709863059 From epeter at openjdk.org Mon Mar 24 10:25:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 10:25:26 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 17:50:20 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add comments for encoding and UCF test/hotspot/gtest/x86/x86-asmtest.py line 60: > 58: registers_mapping = { > 59: # skip rax, rsi, rdi, rsp, rbp as they have special encodings > 60: 'rax': {64: 'rax', 32: 'eax', 16: 'ax', 8: 'al'}, You should probably update the copyright year, right? Also: is this script ever run in testing? I didn't know we had python files in the repository ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r2007174854 From duke at openjdk.org Mon Mar 24 10:25:55 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Mon, 24 Mar 2025 10:25:55 GMT Subject: RFR: 8352490: Fatal error message for unhandled bytecode needs more detail Message-ID: Description: Improve the error message for unhandled bytecode in `line#129` of function `Bytecodes::Code ciBytecodeStream::next_wide_or_table` in file ciStream.cpp Solution: The error message is improved to print OPCODE and bytecode index (BCI) ------------- Commit messages: - 8352490: Fatal error message for unhandled bytecode needs more detail Changes: https://git.openjdk.org/jdk/pull/24187/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24187&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352490 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24187.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24187/head:pull/24187 PR: https://git.openjdk.org/jdk/pull/24187 From epeter at openjdk.org Mon Mar 24 10:29:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 10:29:15 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 17:50:20 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add comments for encoding and UCF I cannot really review he content, nor can I really test the APX features. But I also cannot directly see any issues with it. Therefore, I'll approve. Once the hardware is available, we will probably discover some issues and have to fix them then. Thanks @vamsi-parasa for the work! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23501#pullrequestreview-2709875746 From rcastanedalo at openjdk.org Mon Mar 24 10:35:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 24 Mar 2025 10:35:18 GMT Subject: RFR: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: <5kGSsPrFJSWtzlHmoLAv29p49L-TruX_0aS-9DPmPk0=.538f067e-f617-4f29-a941-33f567c45da7@github.com> <7Fzrb4C-4VyJlOMUaaFqhTzlj4o7dVXS8-EkLCiVVA4=.367fe44f-540c-49cc-aa1c-3ab381febd32@github.com> Message-ID: On Fri, 21 Mar 2025 15:15:00 GMT, Emanuel Peter wrote: >>> What would be a better name though? >> >> @merykitty had the suggestions `MemNode::value_type()` or `MemNode::value_basic_type()` (see comment [above](https://github.com/openjdk/jdk/pull/24005#issuecomment-2724445592)), I like both better than the current name. > >> > What would be a better name though? >> >> @merykitty had the suggestions `MemNode::value_type()` or `MemNode::value_basic_type()` (see comment [above](https://github.com/openjdk/jdk/pull/24005#issuecomment-2724445592)), I like both better than the current name. > > @merykitty @robcasloz `MemNode::value_basic_type()` sounds like the most descriptive and accurate. Great! @robcasloz , will you file an RFE for that? Thanks for re-reviewing and the additional suggestions @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24005#issuecomment-2747642151 From epeter at openjdk.org Mon Mar 24 10:37:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 10:37:14 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value [v2] In-Reply-To: References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: On Tue, 18 Mar 2025 03:51:55 GMT, Xiaohong Gong wrote: >> The following assertion fails on AArch64: >> >> >> Internal Error (jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2991), pid=3822987, tid=3823007 >> assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value >> >> >> with a simple Vector API case: >> >> public static IntVector test() { >> IntVector iv = IntVector.zero(IntVector.SPECIES_128); >> return iv.lanewise(VectorOperators.ROR, iv); >> } >> >> >> On AArch64, vector `ROR/ROL` (rotate right/left) operations are implemented with a combination of shifts. Please see the pattern for `ROR`: >> >> >> lsr dst1, src, cnt // unsigned right shift >> lsl dst2, src, bitSize - cnt // left shift >> orr dst, dst1, dst2 // logical or >> >> where `bitSize` is the element type width (e.g. `32` for `int`). In above case, `cnt` is a zero constant, resulting in a left shift of 32 (`bitSize - 0`), which exceeds the instruction's valid shift count range and triggers the assertion. To fix this, we need to mask the shift count to ensure it stays within valid range when calculating shift counts for rotate operations: `shiftCnt = shiftCnt & (bitSize - 1)`. >> >> Note that the mask is only necessary for constant shift counts. This not only fixes the assertion failure, but also allows `ROR/ROL src, 0` to be optimized to `src` directly. >> >> For vector variables as shift counts, the masking can be safely omitted because: >> 1. Vector shift instructions that take a vector register as the shift count may not automatically apply modulo arithmetic based on register size. When the shift count is `32` for int type, the result may be either `zeros` or `src`. However, this doesn't affect correctness for rotate since the final result is combined with `src` using a logical `OR` operation. >> 2. It saves a vector logical `AND` for masking, which is friendly to the performance. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update the test case I think this is ok as is. But I would like @jatin-bhateja to have a quick look as well :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24051#pullrequestreview-2709902501 From duke at openjdk.org Mon Mar 24 11:04:09 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 24 Mar 2025 11:04:09 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> Message-ID: On Mon, 24 Mar 2025 06:49:31 GMT, kuaiwei wrote: >> @kuaiwei Just ping me when you would like me to re-review :) > > Hi, @eme64 , I changed as comments and merge with master branch. It look no issue in testing. Could you help check it again? Thanks. > @kuaiwei I have not yet had the time to read through the PR, but I would like to talk about `LoadNode::Ideal`. The idea with `Ideal` in general, is that you replace one node with another. After `Ideal` returns, all usages of the old node now take the new node instead. > > You copied the structure from my MergeStores implementation in `StoreNode::Idea`. There it made sense to replace `StoreB` nodes that have a memory output with `LoadI` nodes, which also have memory output. > > But it does not make sense to replace a `LoadB` that has a byte/int output with a `LoadL` that has a long output for example. > > I think your implementation should go into `OrINode`, and match the expression up from there. Because we want to replace the old `OrI` with the new `LoadL`. > > Another question: Do you have some tests where some of the nodes in the `load/shift/or` expression have other uses? Imagine this: > > ``` > l0 = a[0]; > l1 = a[1]; > l2 = a[2]; > l3 = a[3]; > l = ; > now use l1 for something else as well > ``` > > What happens now? Do you check that we only use the old `LoadB` in the expression we are replacing? Hi @eme64 , I understand your concern. In this patch , I check the usage of all `loadB` nodes and only allow they have only single usage into `OrNode`, I also check the `OrNode` as well. So I think it will not cause the trouble. In my previous patch, I tried to extract value from merged `LoadNode` if origin `loadB` has other usage, such as used by uncommon trap. You can find them in https://github.com/openjdk/jdk/pull/24023/commits/b621db1cf0c17885516254a2af4b5df43e06c098 and search MergePrimitiveLoads::extract_value_for_uncommon_trap . But in my test with jtreg tier1, it never hit a case which replaced `LoadB` used by uncommon trap, I think range check smearing remove all the uncommon trap usages. So I revert it to make code simple. In my opinion, the extract_value function can be used as a general solution for other usages. But we may need a cost model to evaluate cost of new instructions which used for extracting and benefit of merged load. To simplify, I choose to check usage strictly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2747730322 From rcastanedalo at openjdk.org Mon Mar 24 11:08:17 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 24 Mar 2025 11:08:17 GMT Subject: Integrated: 8351468: C2: array fill optimization assigns wrong type to intrinsic call In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 09:47:17 GMT, Roberto Casta?eda Lozano wrote: > The [array fill optimization](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/share/opto/loopTransform.cpp#L3533) replaces simple innermost loops that fill an array with copies of the same primitive value: > > > for (int i = 0; i < array.length; i++) { > array[i] = 0; > } > > with a call to an [array filling intrinsic](https://github.com/openjdk/jdk/blob/1d147ccb4cfcb1da23664ac941e56ac542a7ac61/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp#L1665) that is specialized for the array element type: > > > arrayof_jint_fill(array, 0, array.length) > > > The optimization retrieves the (basic) array element type from calling `MemNode::memory_type()` on the original filling store. This is incorrect for stores of `short` values, since these are represented by `StoreC` nodes [whose `memory_type()` is `T_CHAR`](https://github.com/openjdk/jdk/blob/1fe45265e446eeca5dc496085928ce20863a3172/src/hotspot/share/opto/memnode.hpp#L680). As a result, the optimization wrongly assigns the address type `char[]` to `short` array fill loops. This can cause miscompilations due to missing anti-dependences, see the [issue description for further detail](https://bugs.openjdk.org/projects/JDK/issues/JDK-8351468). > > This changeset proposes retrieving the (basic) array element type from the store address type instead. This ensures that the accurate address type is assigned to the intrinsic call, preventing missed anti-dependences and other potential issues caused by mismatching types. Additionally, the changeset makes it easier to reason about correctness by explicitly disabling the optimization for mismatched stores (where the type of the value to be stored differs from the element type of the destination array). Such stores were not optimized before, but only due to pattern matching limitations. > > Assuming mismatched stores are discarded (as proposed here), an alternative solution would be to define a StoreS node returning the appropriate `memory_type()`. This could be desirable even as a complement to this fix, to prevent similar bugs in the future. I propose to investigate the introduction of a StoreS node in a separate RFE, because it is a much larger and more intrusive changeset, and go with this minimal, local, and non-intrusive fix for backportability. > > **Testing:** tier1-5, stress testing (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64). This pull request has now been integrated. Changeset: de580090 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/de580090cd9ada313a878975b9f183045d293684 Stats: 489 lines in 3 files changed: 486 ins; 0 del; 3 mod 8351468: C2: array fill optimization assigns wrong type to intrinsic call Reviewed-by: epeter, thartmann, qamai ------------- PR: https://git.openjdk.org/jdk/pull/24005 From mgronlun at openjdk.org Mon Mar 24 11:44:48 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 11:44:48 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path Message-ID: Greetings, In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. As a result, EA complains about a strange memory graph. Testing: jdk_jfr Thanks Markus ------------- Commit messages: - 8352696 Changes: https://git.openjdk.org/jdk/pull/24192/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24192&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352696 Stats: 125 lines in 2 files changed: 121 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24192/head:pull/24192 PR: https://git.openjdk.org/jdk/pull/24192 From epeter at openjdk.org Mon Mar 24 11:45:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 11:45:13 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> Message-ID: <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> On Mon, 24 Mar 2025 11:00:43 GMT, kuaiwei wrote: >> Hi, @eme64 , I changed as comments and merge with master branch. It look no issue in testing. Could you help check it again? Thanks. > >> @kuaiwei I have not yet had the time to read through the PR, but I would like to talk about `LoadNode::Ideal`. The idea with `Ideal` in general, is that you replace one node with another. After `Ideal` returns, all usages of the old node now take the new node instead. >> >> You copied the structure from my MergeStores implementation in `StoreNode::Idea`. There it made sense to replace `StoreB` nodes that have a memory output with `LoadI` nodes, which also have memory output. >> >> But it does not make sense to replace a `LoadB` that has a byte/int output with a `LoadL` that has a long output for example. >> >> I think your implementation should go into `OrINode`, and match the expression up from there. Because we want to replace the old `OrI` with the new `LoadL`. >> >> Another question: Do you have some tests where some of the nodes in the `load/shift/or` expression have other uses? Imagine this: >> >> ``` >> l0 = a[0]; >> l1 = a[1]; >> l2 = a[2]; >> l3 = a[3]; >> l = ; >> now use l1 for something else as well >> ``` >> >> What happens now? Do you check that we only use the old `LoadB` in the expression we are replacing? > > Hi @eme64 , I understand your concern. In this patch , I check the usage of all `loadB` nodes and only allow they have only single usage into `OrNode`, I also check the `OrNode` as well. So I think it will not cause the trouble. > > > l0 = a[0]; > l1 = a[1]; > l2 = a[2]; > l3 = a[3]; > l = ; > now use l1 for something else as well > > For this case, because l1 has other usage, all these loads will not be merged. > > In my previous patch, I tried to extract value from merged `LoadNode` if origin `loadB` has other usage, such as used by uncommon trap. You can find them in https://github.com/openjdk/jdk/pull/24023/commits/b621db1cf0c17885516254a2af4b5df43e06c098 and search MergePrimitiveLoads::extract_value_for_uncommon_trap . But in my test with jtreg tier1, it never hit a case which replaced `LoadB` used by uncommon trap, I think range check smearing remove all the uncommon trap usages. So I revert it to make code simple. In my opinion, the extract_value function can be used as a general solution for other usages. But we may need a cost model to evaluate cost of new instructions which used for extracting and benefit of merged load. To simplify, I choose to check usage strictly. @kuaiwei Thanks for your response! What about these two things I brought up? > Do you have some tests where some of the nodes in the load/shift/or expression have other uses? It would be good to have these tests, even if we think your code is correct. It is good to verify it with tests. And someone in the future might break it. > I think your implementation should go into OrINode, and match the expression up from there. Because we want to replace the old OrI with the new LoadL. This is really the pattern we use in `Idea`. We replace the node at the bottom of an expression with a new node (or new expression). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2747835311 From epeter at openjdk.org Mon Mar 24 11:45:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 11:45:17 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v9] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 04:29:03 GMT, kuaiwei wrote: >> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. >> >> The benchmark result of MergeLoadBench.java >> AMD EPYC 9T24 96-Core Processor: >> >> |name | -MergeLoads | +MergeLoads |delta| >> |---|---|---|---| >> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | >> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | >> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | >> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | >> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | >> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | >> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | >> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | >> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | >> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | >> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | >> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | >> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | >> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | >> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | >> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | >> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | >> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | >> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | >> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | >> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | >> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | >> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | >> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | >> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | >> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | >> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | >> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | >> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | >> |Merg... > > kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into dev/merge_loads > - Fix test > - Add more tests > - Enable StressIGVN and riscv platform > - Change tests as review comments > - Fix test failure and change for review comments > - Revert extract value and add more tests > - Add tests > - Fix test failure > - Remove some debug trace > - ... and 1 more: https://git.openjdk.org/jdk/compare/024633e7...e37c4bf3 src/hotspot/share/opto/memnode.cpp line 2396: > 2394: assert(last_op != nullptr && (last_op->Opcode() == Op_OrI || last_op->Opcode() == Op_OrL), "sanity"); > 2395: _phase->is_IterGVN()->replace_node(last_op, replace); > 2396: _phase->is_IterGVN()->_worklist.push(merged_load); If you did this in `OrNode::Ideal`, then you just have to return the new load, and `IGVN` takes care of the replacing. That is the code pattern we use everywhere else. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2010012746 From thartmann at openjdk.org Mon Mar 24 12:04:13 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 24 Mar 2025 12:04:13 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 11:35:03 GMT, Markus Gr?nlund wrote: > Greetings, > > In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. > > This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. > > As a result, EA complains about a strange memory graph. > > Testing: jdk_jfr > > Thanks > Markus test/jdk/jdk/jfr/jvm/TestJvmCommitIntrinsicAndEA.java line 85: > 83: public final class TestJvmCommitIntrinsicAndEA { > 84: > 85: public static void main(String[] args) throws Throwable { Quick drive-by comment: The indentation is off, also below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24192#discussion_r2010043449 From chagedorn at openjdk.org Mon Mar 24 12:08:24 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Mar 2025 12:08:24 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v3] In-Reply-To: References: Message-ID: <5WJPojFKKlkAcp93avTRnRQiby4ug48YNOMI34kb00M=.908ad771-1d6e-4011-a709-48f4c26391aa@github.com> On Thu, 20 Mar 2025 07:48:32 GMT, Christian Hagedorn wrote: > That looks reasonable. I've launched some testing and results look good so far (there is quite some load at the moment - will take a bit longer to complete than usual). Testing looked good (did not cover the `TraceLoopOpts` update). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23465#issuecomment-2747897126 From chagedorn at openjdk.org Mon Mar 24 12:08:25 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Mar 2025 12:08:25 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 14:41:13 GMT, Roland Westrelin wrote: > Added. The TraceLoopOpts crash reproduces: the code hits a malformed counted loop. I tweaked the printing code. Is the malformed counted loop expected or a different issue to look into? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2010045317 From chagedorn at openjdk.org Mon Mar 24 12:08:25 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Mar 2025 12:08:25 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v5] In-Reply-To: <4hKl3zRJ6EP4QA-iuKiEpdwIqFk2-YvrpixAGy_VidU=.e9490e22-7751-41a6-a3e7-202930be570a@github.com> References: <4hKl3zRJ6EP4QA-iuKiEpdwIqFk2-YvrpixAGy_VidU=.e9490e22-7751-41a6-a3e7-202930be570a@github.com> Message-ID: On Fri, 21 Mar 2025 14:46:35 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/macroArrayCopy.cpp line 826: >> >>> 824: } >>> 825: >>> 826: if (is_partial_array_copy) { >> >> Why is this check no longer required? > > ` ArrayCopyNode::may_modify()` performs some pattern matching and needs to be in sync with the shape of the array copy once expanded. If that shape changes then ` ArrayCopyNode::may_modify()` needs to be adjusted. The code you points to was added when the shape of the expanded array copy was changed to avoid a complicated update to the pattern matching in ` ArrayCopyNode::may_modify()`. What I propose is to get rid of the pattern matching because it's fragile and to instead always use the trick from that change where the final `MemBarNode` is marked, so make it unconditional. Makes sense, thanks for the explanation! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2010043557 From dfenacci at openjdk.org Mon Mar 24 12:10:53 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Mar 2025 12:10:53 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v8] In-Reply-To: References: Message-ID: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. > > # Testing > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) > > Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8302459: add comments to asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21682/files - new: https://git.openjdk.org/jdk/pull/21682/files/cfa5252b..f84cafbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=06-07 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21682/head:pull/21682 PR: https://git.openjdk.org/jdk/pull/21682 From epeter at openjdk.org Mon Mar 24 12:13:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 12:13:14 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: <8Cws-Ux-7E1BLFqNQ_2rHNVe4qZfT1Ob1I4ylxjJH3U=.0fccdf27-46f6-45aa-9a6c-8c3e4d1e76f4@github.com> On Mon, 24 Mar 2025 02:07:41 GMT, Xiaohong Gong wrote: >>> @XiaohongGong Could you please also merge here before I rerun the testing? >> >> Sure and have rebased. Thanks a lot for your testing! > >> @XiaohongGong Tests launched! Please ping me after the weekend for the results ? > > Hi @eme64 , thanks for your testing! May I ask about the test results please? @XiaohongGong Testing passed :green_circle: ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2747914733 From thartmann at openjdk.org Mon Mar 24 12:14:25 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 24 Mar 2025 12:14:25 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 11:35:03 GMT, Markus Gr?nlund wrote: > Greetings, > > In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. > > This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. > > As a result, EA complains about a strange memory graph. > > Testing: jdk_jfr > > Thanks > Markus Looks good to me otherwise. test/jdk/jdk/jfr/jvm/TestJvmCommitIntrinsicAndEA.java line 79: > 77: * @bug 8352696 > 78: * @requires vm.flagless > 79: * @requires vm.hasJFR & vm.debug Could be merged. test/jdk/jdk/jfr/jvm/TestJvmCommitIntrinsicAndEA.java line 81: > 79: * @requires vm.hasJFR & vm.debug > 80: * @library /test/lib /test/jdk > 81: * @run main/othervm jdk.jfr.jvm.TestJvmCommitIntrinsicAndEA Maybe use `-Xbatch`? ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24192#pullrequestreview-2710145599 PR Review Comment: https://git.openjdk.org/jdk/pull/24192#discussion_r2010058606 PR Review Comment: https://git.openjdk.org/jdk/pull/24192#discussion_r2010046632 From epeter at openjdk.org Mon Mar 24 12:16:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 12:16:12 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors [v6] In-Reply-To: References: Message-ID: <8BIupopdiUx27ULwn_fw0iTeHy4kb0UVSWllINuViuc=.d22554c1-83d0-4521-b679-bc9bf9d7a978@github.com> On Thu, 20 Mar 2025 07:13:43 GMT, Xiaohong Gong wrote: >> The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. >> >> Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. >> >> This patch added the rearrange support for vector types with small lane count. Here are the main changes: >> - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) >> - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation >> - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one >> >> Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. >> >> 1) NEON >> >> JMH on panama-vector:vectorIntrinsics: >> >> Benchmark (size) Mode Cnt Units Before After Gain >> Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x >> Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x >> Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x >> Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x >> Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x >> Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x >> Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x >> Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x >> Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x >> Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x >> Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x >> Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into JDK-8350463 > - Use a smaller warmup and array length in IR test > - Update IR test based on the review comment > - Merge branch 'jdk:master' into JDK-8350463 > - Add the IR test > - 8350463: AArch64: Add vector rearrange support for small lane count vectors > > The AArch64 vector rearrange implementation currently lacks support for > vector types with lane counts < 4 (see [1]). This limitation results in > significant performance gaps when running Long/Double vector benchmarks > on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to > other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used > byte array as payload previously. The minimum vector lane count of 4 for > byte type on AArch64 imposed this limitation on rearrange operations. > However, vector shuffle payload has been updated to use vector-specific > data types (e.g., `int` for `IntVector`) (see [2]). This change enables > us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane > count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller > lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file > for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for > short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH > benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the > same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > ``` > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt ... I don't know about aarch64 instructions specifically to review this in depth, but it looks reasonable. Testing looks good too. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23790#pullrequestreview-2710173259 From thartmann at openjdk.org Mon Mar 24 12:18:12 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 24 Mar 2025 12:18:12 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v8] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 12:10:53 GMT, Damon Fenacci wrote: >> # Issue >> >> The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. >> >> # Cause >> >> The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. >> The graph that leads to the issue looks like this: >> ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) >> The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: >> ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) >> The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. >> The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. >> >> This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. >> >> # Solution >> >> Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. >> >> # Testing >> >> Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) >> >> Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8302459: add comments to asserts Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21682#pullrequestreview-2710179086 From dfenacci at openjdk.org Mon Mar 24 12:18:12 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Mar 2025 12:18:12 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v5] In-Reply-To: <09Nnwhd1-0I_uMC6zFEvJYny8Ysf008kxUV-hmA0mc0=.4070a496-c18b-4415-bb26-9b3d81923f87@github.com> References: <4gi1QLJRikwQR2ShA9zy_cOK4NDsjrJK4ZyyuzuNLjc=.924da387-362c-4a39-b4da-0d347c72d354@github.com> <3TUaCIHwDAl8dK1hATRK8m5XZIK1oeY8231x1HaLl3s=.ac07bdce-7807-461f-8d5a-906d50d1c411@github.com> <09Nnwhd1-0I_uMC6zFEvJYny8Ysf008kxUV-hmA0mc0=.4070a496-c18b-4415-bb26-9b3d81923f87@github.com> Message-ID: On Mon, 24 Mar 2025 08:47:22 GMT, Tobias Hartmann wrote: >> The difference is whether a call can be scheduled for a repeated inlining attempt in the future. >> >> `cg->call_node()->set_generator(cg)` reinitializes `cg` in `CallNode` and lets IGVN to submit it for incremental inlining during future passes. >> >> The first check guards against a situation when the call node is already on IGVN list (so, it will be automatically rescheduled for inlining during the next IGVN pass causing an infinite loop in incremental inlining). >> >> The second assert catches a suspicious situation when the call node disappears from IGVN worklist during failed inlining attempt. IMO it should not happens, hence the assert. But it is benign to allow repeated inlining in such case. > > Okay, thanks for the background. If we keep both asserts, some comments explaining this would be helpful. Thanks for the suggestion! I've added a couple of comments to the asserts. I've also changed the second assert into `assert(!is_scheduled_for_igvn_before || is_scheduled_for_igvn_after, "call node removed from IGVN list during inlining pass");`. The final result is the same but the expression matches what the message says 1-1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r2010063495 From thartmann at openjdk.org Mon Mar 24 12:18:13 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 24 Mar 2025 12:18:13 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v5] In-Reply-To: References: <4gi1QLJRikwQR2ShA9zy_cOK4NDsjrJK4ZyyuzuNLjc=.924da387-362c-4a39-b4da-0d347c72d354@github.com> <3TUaCIHwDAl8dK1hATRK8m5XZIK1oeY8231x1HaLl3s=.ac07bdce-7807-461f-8d5a-906d50d1c411@github.com> <09Nnwhd1-0I_uMC6zFEvJYny8Ysf008kxUV-hmA0mc0=.4070a496-c18b-4415-bb26-9b3d81923f87@github.com> Message-ID: On Mon, 24 Mar 2025 12:14:11 GMT, Damon Fenacci wrote: >> Okay, thanks for the background. If we keep both asserts, some comments explaining this would be helpful. > > Thanks for the suggestion! > I've added a couple of comments to the asserts. > I've also changed the second assert into `assert(!is_scheduled_for_igvn_before || is_scheduled_for_igvn_after, "call node removed from IGVN list during inlining pass");`. The final result is the same but the expression matches what the message says 1-1. Looks good to me, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r2010065659 From epeter at openjdk.org Mon Mar 24 12:19:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 12:19:14 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v4] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 07:06:31 GMT, Xiaohong Gong wrote: >> Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures. >> >> The performance of Vector API relative JMH micro benchmarks can improve about 70x ~ 95x on a NVIDIA Grace CPU, which is a 128-bit vector length sve2 architecture, with different UseSVE options. Here is the gain details: >> >> >> Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 >> ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x >> ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x >> ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x >> ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x >> ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x >> ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x >> ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x >> ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x >> ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x >> ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x >> ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x >> ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x >> IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x >> IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x >> IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x >> IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x >> IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x >> IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x >> IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x >> IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x >> IntMaxVector.UMAX 1024 thrpt 30 82.40x 81.85x 82.242x >> IntMaxVector.UMAXMasked 1024 thrpt 30 90.30x 92.10x 92.587x >> IntMaxVector.UMIN 1024 thrpt 30 82.84x 81.43x 82.801x >> IntMaxVector.UMINMasked 1024 thrpt 30 90.43x 91.49x 92.678x >> LongMaxVector.SADD 102... > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'jdk:master' into JDK_8349522 > - Fix IR test failure on X64 with UseAVX=1 > - Merge branch 'jdk:master' into JDK_8349522 > - 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations > > Since PR [1] has added several new vector operations in VectorAPI > and the X86 backend implementation for them, this patch adds the > AArch64 backend part for NEON/SVE architectures. > > The performance of Vector API relative jmh micro benchmarks can > improve about 70x ~ 95x on an AArch64 128-bit vector length sve2 > architecture with different UseSVE options. Here is the uplift > details: > > ``` > Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 > ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x > ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x > ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x > ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x > ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x > ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x > ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x > ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x > ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x > ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x > ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x > ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x > IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x > IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x > IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x > IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x > IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x > IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x > IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x > IntMaxVector.SUSUBMasked 1024 thrpt ... I cannot review the aarch64 instructions in detail, but it looks reasonable. Testing is passing too :green_circle: ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23608#pullrequestreview-2710180758 From epeter at openjdk.org Mon Mar 24 12:19:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 12:19:15 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: <8gRkivkxdlGCezJE_ZtvkO7ONzLpIpzY0PXT-6MBNI8=.9719afc9-94e9-45ed-a2d4-63e6a3593402@github.com> References: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> <8gRkivkxdlGCezJE_ZtvkO7ONzLpIpzY0PXT-6MBNI8=.9719afc9-94e9-45ed-a2d4-63e6a3593402@github.com> Message-ID: <2twhpJnhbQPC7I4jJGVlawsY9EkT8ZCYwa6xUxRTUls=.81ca56f4-7534-4bab-b98b-25252c0c7977@github.com> On Mon, 24 Mar 2025 02:08:33 GMT, Xiaohong Gong wrote: >> Hi @eme64 I'v rebased this PR. Thanks a lot for your testing! > >> @XiaohongGong Testing launched! Please ping me after the weekend for the results ;) > > Thanks for your testing @eme64 . May I ask about the test results please? Thanks @XiaohongGong For the work ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2747933003 From epeter at openjdk.org Mon Mar 24 12:20:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 12:20:11 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 02:07:41 GMT, Xiaohong Gong wrote: >>> @XiaohongGong Could you please also merge here before I rerun the testing? >> >> Sure and have rebased. Thanks a lot for your testing! > >> @XiaohongGong Tests launched! Please ping me after the weekend for the results ? > > Hi @eme64 , thanks for your testing! May I ask about the test results please? Thanks @XiaohongGong For the work ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2747933736 From mgronlun at openjdk.org Mon Mar 24 12:22:10 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 12:22:10 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v2] In-Reply-To: References: Message-ID: > Greetings, > > In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. > > This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. > > As a result, EA complains about a strange memory graph. > > Testing: jdk_jfr > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: - merged requires - indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24192/files - new: https://git.openjdk.org/jdk/pull/24192/files/8e4185c2..e072153b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24192&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24192&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24192/head:pull/24192 PR: https://git.openjdk.org/jdk/pull/24192 From mgronlun at openjdk.org Mon Mar 24 12:22:11 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 12:22:11 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v2] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 12:11:31 GMT, Tobias Hartmann wrote: > Looks good to me otherwise. Thanks so much for taking a look @TobiHartmann ! > test/jdk/jdk/jfr/jvm/TestJvmCommitIntrinsicAndEA.java line 81: > >> 79: * @requires vm.hasJFR & vm.debug >> 80: * @library /test/lib /test/jdk >> 81: * @run main/othervm jdk.jfr.jvm.TestJvmCommitIntrinsicAndEA > > Maybe use `-Xbatch`? What would it give? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24192#issuecomment-2747938310 PR Review Comment: https://git.openjdk.org/jdk/pull/24192#discussion_r2010065975 From thartmann at openjdk.org Mon Mar 24 12:24:15 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 24 Mar 2025 12:24:15 GMT Subject: RFR: 8352490: Fatal error message for unhandled bytecode needs more detail In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 10:19:50 GMT, Saranya Natarajan wrote: > Description: Improve the error message for unhandled bytecode in `line#129` of function `Bytecodes::Code ciBytecodeStream::next_wide_or_table` in file ciStream.cpp > > Solution: The error message is improved to print OPCODE and bytecode index (BCI) Should we also print the name of the current method (via `ciBytecodeStream::method()`)? With inlining, it might be different from the root of the compilation. ------------- PR Review: https://git.openjdk.org/jdk/pull/24187#pullrequestreview-2710193309 From mgronlun at openjdk.org Mon Mar 24 12:26:12 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 12:26:12 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v2] In-Reply-To: References: Message-ID: <5ASdY5jtvoX2wp-xEWbHSe_fLBiRuWklQra9T0MYG4U=.55746d71-6689-4212-8443-6a2cace9a8f1@github.com> On Mon, 24 Mar 2025 12:16:02 GMT, Markus Gr?nlund wrote: >> test/jdk/jdk/jfr/jvm/TestJvmCommitIntrinsicAndEA.java line 81: >> >>> 79: * @requires vm.hasJFR & vm.debug >>> 80: * @library /test/lib /test/jdk >>> 81: * @run main/othervm jdk.jfr.jvm.TestJvmCommitIntrinsicAndEA >> >> Maybe use `-Xbatch`? > > What would it give? Disabling background compilation makes it more deterministic? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24192#discussion_r2010076673 From duke at openjdk.org Mon Mar 24 12:28:13 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 24 Mar 2025 12:28:13 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> Message-ID: On Mon, 24 Mar 2025 11:00:43 GMT, kuaiwei wrote: >> Hi, @eme64 , I changed as comments and merge with master branch. It look no issue in testing. Could you help check it again? Thanks. > >> @kuaiwei I have not yet had the time to read through the PR, but I would like to talk about `LoadNode::Ideal`. The idea with `Ideal` in general, is that you replace one node with another. After `Ideal` returns, all usages of the old node now take the new node instead. >> >> You copied the structure from my MergeStores implementation in `StoreNode::Idea`. There it made sense to replace `StoreB` nodes that have a memory output with `LoadI` nodes, which also have memory output. >> >> But it does not make sense to replace a `LoadB` that has a byte/int output with a `LoadL` that has a long output for example. >> >> I think your implementation should go into `OrINode`, and match the expression up from there. Because we want to replace the old `OrI` with the new `LoadL`. >> >> Another question: Do you have some tests where some of the nodes in the `load/shift/or` expression have other uses? Imagine this: >> >> ``` >> l0 = a[0]; >> l1 = a[1]; >> l2 = a[2]; >> l3 = a[3]; >> l = ; >> now use l1 for something else as well >> ``` >> >> What happens now? Do you check that we only use the old `LoadB` in the expression we are replacing? > > Hi @eme64 , I understand your concern. In this patch , I check the usage of all `loadB` nodes and only allow they have only single usage into `OrNode`, I also check the `OrNode` as well. So I think it will not cause the trouble. > > > l0 = a[0]; > l1 = a[1]; > l2 = a[2]; > l3 = a[3]; > l = ; > now use l1 for something else as well > > For this case, because l1 has other usage, all these loads will not be merged. > > In my previous patch, I tried to extract value from merged `LoadNode` if origin `loadB` has other usage, such as used by uncommon trap. You can find them in https://github.com/openjdk/jdk/pull/24023/commits/b621db1cf0c17885516254a2af4b5df43e06c098 and search MergePrimitiveLoads::extract_value_for_uncommon_trap . But in my test with jtreg tier1, it never hit a case which replaced `LoadB` used by uncommon trap, I think range check smearing remove all the uncommon trap usages. So I revert it to make code simple. In my opinion, the extract_value function can be used as a general solution for other usages. But we may need a cost model to evaluate cost of new instructions which used for extracting and benefit of merged load. To simplify, I choose to check usage strictly. > @kuaiwei Thanks for your response! > > What about these two things I brought up? > > > Do you have some tests where some of the nodes in the load/shift/or expression have other uses? > > It would be good to have these tests, even if we think your code is correct. It is good to verify it with tests. And someone in the future might break it. > > > I think your implementation should go into OrINode, and match the expression up from there. Because we want to replace the old OrI with the new LoadL. > > This is really the pattern we use in `Idea`. We replace the node at the bottom of an expression with a new node (or new expression). Tests will be added. I will try move the optimization to OrNode::Ideal. Thanks for your suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2747955613 From mgronlun at openjdk.org Mon Mar 24 12:34:53 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 12:34:53 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v3] In-Reply-To: References: Message-ID: > Greetings, > > In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. > > This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. > > As a result, EA complains about a strange memory graph. > > Testing: jdk_jfr > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: Xbatch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24192/files - new: https://git.openjdk.org/jdk/pull/24192/files/e072153b..0ba8abfb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24192&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24192&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24192/head:pull/24192 PR: https://git.openjdk.org/jdk/pull/24192 From shade at openjdk.org Mon Mar 24 12:34:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Mar 2025 12:34:54 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v2] In-Reply-To: References: Message-ID: <73iiU00Ahqm8b_DiUS97EgegNnSJh-mxJEqUUSGJSZg=.8dd13ae9-2d35-4130-8350-ad0b9f7bf695@github.com> On Mon, 24 Mar 2025 12:22:10 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. >> >> This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. >> >> As a result, EA complains about a strange memory graph. >> >> Testing: jdk_jfr >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: > > - merged requires > - indentation src/hotspot/share/opto/library_call.cpp line 3179: > 3177: store_to_memory(control(), java_buffer_pos_offset, next_pos_X, T_LONG, MemNode::release); > 3178: #else > 3179: store_to_memory(control(), java_buffer_pos_offset, next_pos_X, T_INT, MemNode::release); BTW, you can write this is less duplication, like: store_to_memory(control(), java_buffer_pos_offset, next_pos_X, LP64_ONLY(T_LONG) NOT_LP64(T_INT), MemNode::release); (I wish there was a `X`-style macro like `MakeConX`, but I don't think there is one for `BasicType`) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24192#discussion_r2010087366 From mgronlun at openjdk.org Mon Mar 24 12:34:54 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 12:34:54 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v2] In-Reply-To: <73iiU00Ahqm8b_DiUS97EgegNnSJh-mxJEqUUSGJSZg=.8dd13ae9-2d35-4130-8350-ad0b9f7bf695@github.com> References: <73iiU00Ahqm8b_DiUS97EgegNnSJh-mxJEqUUSGJSZg=.8dd13ae9-2d35-4130-8350-ad0b9f7bf695@github.com> Message-ID: On Mon, 24 Mar 2025 12:29:52 GMT, Aleksey Shipilev wrote: >> Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: >> >> - merged requires >> - indentation > > src/hotspot/share/opto/library_call.cpp line 3179: > >> 3177: store_to_memory(control(), java_buffer_pos_offset, next_pos_X, T_LONG, MemNode::release); >> 3178: #else >> 3179: store_to_memory(control(), java_buffer_pos_offset, next_pos_X, T_INT, MemNode::release); > > BTW, you can write this is less duplication, like: > > > store_to_memory(control(), java_buffer_pos_offset, next_pos_X, LP64_ONLY(T_LONG) NOT_LP64(T_INT), MemNode::release); > > > (I wish there was a `X`-style macro like `MakeConX`, but I don't think there is one for `BasicType`) Nice! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24192#discussion_r2010089591 From epeter at openjdk.org Mon Mar 24 12:37:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 12:37:22 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Wed, 19 Mar 2025 14:36:29 GMT, Christian Hagedorn wrote: >> The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). >> >> Note that most of the line changes are from new tests. >> >> ### The Problem >> There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. >> >> #### An Inner Loop without Parse Predicates >> The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) >> >> We only have Parse Predicates for the outer loop. Why? >> >> Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) >> >> In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. >> >> For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. >> >> #### Predicates from a Folded Loop End up at Another Loop >> As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Small things > - Fix test comments > - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. > - Merge branch 'master' into JDK-8350579 > - Revert fix completely > - 8350579: Remove Template Assertion Predicates belonging to a > loop once it is folded away during IGVN Nice work @chhagedorn ! src/hotspot/share/opto/loopTransform.cpp line 1706: > 1704: // Compute the value of the loop induction variable at the end of the > 1705: // first iteration of the unrolled loop: init + new_stride_con - init_inc > 1706: int unrolled_stride_con = stride_con_before_unroll * 2; Could we assert that `stride_con_before_unroll == main_loop_head->stride_con()`? src/hotspot/share/opto/predicates.cpp line 1267: > 1265: template_assertion_predicate.opaque_node()->mark_useful(); > 1266: } > 1267: } I'm not sure if it makes to split this into two methods, but that's subjective ? It seems to me that the code in `visit` is an optimization for what happens in `mark_template_useful_if_matching_loop`, and does not really make sense on its own. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23823#pullrequestreview-2710199784 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2010077164 PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2010092615 From epeter at openjdk.org Mon Mar 24 12:37:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 12:37:22 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Mon, 24 Mar 2025 12:23:38 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Small things >> - Fix test comments >> - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. >> - Merge branch 'master' into JDK-8350579 >> - Revert fix completely >> - 8350579: Remove Template Assertion Predicates belonging to a >> loop once it is folded away during IGVN > > src/hotspot/share/opto/loopTransform.cpp line 1706: > >> 1704: // Compute the value of the loop induction variable at the end of the >> 1705: // first iteration of the unrolled loop: init + new_stride_con - init_inc >> 1706: int unrolled_stride_con = stride_con_before_unroll * 2; > > Could we assert that `stride_con_before_unroll == main_loop_head->stride_con()`? If not, could we assert something similar? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2010080552 From mgronlun at openjdk.org Mon Mar 24 12:39:56 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 12:39:56 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v4] In-Reply-To: References: Message-ID: > Greetings, > > In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. > > This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. > > As a result, EA complains about a strange memory graph. > > Testing: jdk_jfr > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: fold ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24192/files - new: https://git.openjdk.org/jdk/pull/24192/files/0ba8abfb..c63fb608 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24192&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24192&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24192/head:pull/24192 PR: https://git.openjdk.org/jdk/pull/24192 From epeter at openjdk.org Mon Mar 24 12:44:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 12:44:12 GMT Subject: RFR: 8350609: Cleanup unknown unwind opcode (0xB) for windows In-Reply-To: References: Message-ID: <9Q_DYd2P7ku3Em_cM-uCfa34lWnJsJ5n4vaHBHqEbfY=.c3e7cc7d-864c-41c6-9a7a-83a663e1fcb7@github.com> On Thu, 20 Feb 2025 03:58:17 GMT, Dhamoder Nalla wrote: > This PR is to clean-up unknown unwind opcodes (0xB) in Windows intrinsic functions introduced in commit https://github.com/openjdk/jdk17u-dev/commit/9f05c411e6d6bdf612cf0cf8b9fe4ca9ecde50d1#diff-a024df6bcd94607260545e647922261703a652dee1afadb1fa758f6e74a568d1 > > ![image](https://github.com/user-attachments/assets/5b295365-ba8e-4fd6-8b8b-f7243f80a496) > > According to the Windows unwind Opcodes outlined at https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170#unwind-operation-code, the opcode 0xB (1011) is not a valid Opcode, as the valid opcodes range from 0 to 10. > > Test performed: > 1. tier1 tests > 2. Vector tests under /test/jdk/incubator/vector Testing passed. The patch looks reasonable :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23707#pullrequestreview-2710247828 From duke at openjdk.org Mon Mar 24 13:09:02 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 24 Mar 2025 13:09:02 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v17] In-Reply-To: References: Message-ID: > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 31 additional commits since the last revision: - Merge branch 'master' into fix/missing-transformation-for-chain-of-shifts-multiplications-by-constants - Rephrase comment - more checks - order - rephrase - correct - s - rephrased corner case - rephrase - char -> byte - ... and 21 more: https://git.openjdk.org/jdk/compare/50ec8d0a...cd0b0c09 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23728/files - new: https://git.openjdk.org/jdk/pull/23728/files/124d9382..cd0b0c09 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23728&range=15-16 Stats: 66044 lines in 1237 files changed: 32505 ins; 21388 del; 12151 mod Patch: https://git.openjdk.org/jdk/pull/23728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23728/head:pull/23728 PR: https://git.openjdk.org/jdk/pull/23728 From azafari at openjdk.org Mon Mar 24 13:33:49 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 24 Mar 2025 13:33:49 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() Message-ID: The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift. Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. Tests: linux-x64-debug tier1 on a UBSAN enabled build. ------------- Commit messages: - 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() Changes: https://git.openjdk.org/jdk/pull/24196/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24196&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352141 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24196.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24196/head:pull/24196 PR: https://git.openjdk.org/jdk/pull/24196 From rcastanedalo at openjdk.org Mon Mar 24 13:43:23 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 24 Mar 2025 13:43:23 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v10] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 13:15:53 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Remove accidental leftover #endif src/hotspot/share/opto/regmask.hpp line 188: > 186: // all registers/stack locations under _lwm and over _hwm are excluded. > 187: // The exception is (s10, s11, ...), where the value is decided solely by > 188: // _all_stack, regardless of the value of _hwm. This comment illustrates the case with `_offset = 0`, I think it would be useful to extend it with an example where `_offset > 0`. Here is a suggestion: https://github.com/openjdk/jdk/commit/8377012ac485a70703921822d58bc535bafb7a0a. Feel free to merge as-is or edit to your liking, if you agree. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2010206315 From duke at openjdk.org Mon Mar 24 14:02:26 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 24 Mar 2025 14:02:26 GMT Subject: RFR: 8352595: Regression of JDK-8314999 in IR matching [v2] In-Reply-To: References: Message-ID: > A lot of tests for the IR framework used `ALLOC` and friends as a check that would run on the Opto assembly by default, but can also run earlier, but that's no longer the case. > > There were two kinds of tests to fix: the ones rather about `ALLOC`, where the used or expected compile phases have to change, and the tests where `ALLOC` were just a check that would run on opto assembly. For this, I tried to keep the spirit of the test using other regexes made for this stage. Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: - wip - wip ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24163/files - new: https://git.openjdk.org/jdk/pull/24163/files/17e11d4a..b125df8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24163&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24163&range=00-01 Stats: 37 lines in 2 files changed: 4 ins; 0 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/24163.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24163/head:pull/24163 PR: https://git.openjdk.org/jdk/pull/24163 From duke at openjdk.org Mon Mar 24 14:08:13 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 24 Mar 2025 14:08:13 GMT Subject: RFR: 8352595: Regression of JDK-8314999 in IR matching [v2] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:02:26 GMT, Marc Chevalier wrote: >> A lot of tests for the IR framework used `ALLOC` and friends as a check that would run on the Opto assembly by default, but can also run earlier, but that's no longer the case. >> >> There were two kinds of tests to fix: the ones rather about `ALLOC`, where the used or expected compile phases have to change, and the tests where `ALLOC` were just a check that would run on opto assembly. For this, I tried to keep the spirit of the test using other regexes made for this stage. > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - wip > - wip Yes, I indeed misunderstood which tests were specifically for ALLOC, and which were about OptoAssembly, and ALLOC is just a mean. The new diff makes more sense, indeed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24163#issuecomment-2748247271 From epeter at openjdk.org Mon Mar 24 14:11:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 14:11:51 GMT Subject: RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops Message-ID: This was a fuzzer failure, which hit an assert in SuperWord: `# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop` We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible. I'm proposing two changes in this PR: - We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop. - In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`. I added 2 tests: - The fuzzer generated test that hits the assert before this patch. - An IR test that checks that we do not multiversion in a `PeelMainPost` loop case. --------------- **FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example: `test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java` With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` Counted Loop: N537/N176 counted [int,100),+1 (-1 iters) Loop: N0/N0 has_sfpt Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 } Loop: N536/N535 Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined Poor node estimate: 306 >> 92 Loop: N0/N0 has_sfpt Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 } Loop: N556/N557 sfpts={ 559 } Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt Loop: N536/N535 sfpts={ 538 } Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined Loop: N575/N577 counted [int,100),+1 (4 iters) post multiversion_fast has_sfpt Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt Parallel IV: 643 Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined Parallel IV: 646 Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt Parallel IV: 652 Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined Parallel IV: 649 Loop: N575/N577 counted [int,100),+1 (4 iters) post multiversion_fast has_sfpt Loop: N0/N0 has_sfpt Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 } Loop: N556/N557 sfpts={ 559 } Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt Loop: N536/N535 sfpts={ 538 } Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined Loop: N575/N577 counted [int,100),+1 (4 iters) post multiversion_fast has_sfpt Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt Empty without zero trip guard Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined Peel Loop: N552/N554 counted [int,100),+1 (100 iters) multiversion_delayed_slow has_sfpt strip_mined Empty without zero trip guard Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt Peel Loop: N599/N601 counted [int,int),+1 (4 iters) pre multiversion_fast has_sfpt Unroll 4 Loop: N629/N176 counted [int,99),+2 (100 iters) main multiversion_fast has_sfpt strip_mined It seems that we are able to detect some loops as empty loops, including the pre-loop. But somhow the main-loop is not removed by "empty loop", and now this main-loop cannot traverse through the pre-loop to the `multiversion_if`. If reviewers thing this really should be investigated, I could file a follow-up RFE. ------------- Commit messages: - Merge branch 'master' into JDK-8352587-Multiversion-PeelMainPost - rm assert - peel-main-post IR test - the fix - JDK-8352587 Changes: https://git.openjdk.org/jdk/pull/24183/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24183&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352587 Stats: 138 lines in 4 files changed: 134 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24183/head:pull/24183 PR: https://git.openjdk.org/jdk/pull/24183 From hgreule at openjdk.org Mon Mar 24 14:15:12 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 24 Mar 2025 14:15:12 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v3] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 12:36:23 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> add tests for ReverseBytesS/ReverseBytesUS > > Nice idea! Thanks for the work :) @eme64 could you take another look? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23851#issuecomment-2748276063 From duke at openjdk.org Mon Mar 24 14:19:37 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Mon, 24 Mar 2025 14:19:37 GMT Subject: RFR: 8352490: Fatal error message for unhandled bytecode needs more detail [v2] In-Reply-To: References: Message-ID: > Description: Improve the error message for unhandled bytecode in `line#129` of function `Bytecodes::Code ciBytecodeStream::next_wide_or_table` in file ciStream.cpp > > Solution: The error message is improved to print OPCODE and bytecode index (BCI) Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: adding information for printing current method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24187/files - new: https://git.openjdk.org/jdk/pull/24187/files/51a5df91..cd029a9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24187&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24187&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24187.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24187/head:pull/24187 PR: https://git.openjdk.org/jdk/pull/24187 From duke at openjdk.org Mon Mar 24 14:28:52 2025 From: duke at openjdk.org (Johannes Graham) Date: Mon, 24 Mar 2025 14:28:52 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v42] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed (2 reviews required, with at least 2 [Reviewers](https://openjdk.org/bylaws#reviewer)) > > > > ### Reviewers > * [Quan Anh Mai](https://openjdk.org/census#qamai) (@merykitty - Committer) ? Re-review required (review applies to [cf779497](https://git.openjdk.org/jdk/pull/23089/files/cf77949776f7a4601268c7291a5743c2eb164186)) > > ### Reviewing >
Using git > > Checkout this PR locally: \ > `$ git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089` \ > `$ git checkout pull/23089` > > Update a local copy of the PR: \ > `$ git checkout pull/23089` \ > `$ git pull https://git.openjdk.org/jdk.git pull/23089/head` > >
>
Using Skara CLI tools > > Checkout this PR locally: \ > `$ git pr checkout 23089` > > View PR using the GUI difftool: \ > `$ git pr show -t 23089` > >
>
Using diff file > > Download this PR as a diff file: \ > https://git.openjdk.org/jdk/pull/23089.diff > >
>
Using Webrev > > [Link to Webrev Comment](https://git.openjdk.org/jdk/pull/23089#issuecomment-2593992282) >
Johannes Graham has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: - Merge branch 'openjdk:master' into xor_const - Merge branch 'openjdk:master' into xor_const - invert comparison in tests - update bug numbers and summary - add test of random ranges - consistency - Merge branch 'openjdk:master' into xor_const - widen range of test values; add missing comment - a few more tests - add comments Co-authored-by: Emanuel Peter - ... and 47 more: https://git.openjdk.org/jdk/compare/02a4ce23...06537f21 ------------- Changes: https://git.openjdk.org/jdk/pull/23089/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=41 Stats: 527 lines in 5 files changed: 476 ins; 25 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From duke at openjdk.org Mon Mar 24 14:35:36 2025 From: duke at openjdk.org (Marc Chevalier) Date: Mon, 24 Mar 2025 14:35:36 GMT Subject: RFR: 8352595: Regression of JDK-8314999 in IR matching [v3] In-Reply-To: References: Message-ID: > A lot of tests for the IR framework used `ALLOC` and friends as a check that would run on the Opto assembly by default, but can also run earlier, but that's no longer the case. > > There were two kinds of tests to fix: the ones rather about `ALLOC`, where the used or expected compile phases have to change, and the tests where `ALLOC` were just a check that would run on opto assembly. For this, I tried to keep the spirit of the test using other regexes made for this stage. Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into fix/fix-IRframework-test - wip - wip - Fix TestCompilePhaseCollector.java - Fix TestPhaseIRMatching.java - Fix TestBadFormat ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24163/files - new: https://git.openjdk.org/jdk/pull/24163/files/b125df8a..c7c0cfa6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24163&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24163&range=01-02 Stats: 3524 lines in 81 files changed: 1193 ins; 1841 del; 490 mod Patch: https://git.openjdk.org/jdk/pull/24163.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24163/head:pull/24163 PR: https://git.openjdk.org/jdk/pull/24163 From dlunden at openjdk.org Mon Mar 24 15:33:34 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 24 Mar 2025 15:33:34 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: References: Message-ID: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Extend example with offset register mask ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/e370f61f..fbfddb29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=09-10 Stats: 20 lines in 1 file changed: 20 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From chagedorn at openjdk.org Mon Mar 24 15:34:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Mar 2025 15:34:12 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: <_O9CAKoVeIShhjn4R82yvaypwt6tkAAmNWEPFCO88lE=.ac481d78-94aa-40c5-983b-86ad360cdd1c@github.com> On Wed, 19 Mar 2025 14:36:29 GMT, Christian Hagedorn wrote: >> The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). >> >> Note that most of the line changes are from new tests. >> >> ### The Problem >> There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. >> >> #### An Inner Loop without Parse Predicates >> The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) >> >> We only have Parse Predicates for the outer loop. Why? >> >> Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) >> >> In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. >> >> For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. >> >> #### Predicates from a Folded Loop End up at Another Loop >> As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Small things > - Fix test comments > - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. > - Merge branch 'master' into JDK-8350579 > - Revert fix completely > - 8350579: Remove Template Assertion Predicates belonging to a > loop once it is folded away during IGVN Thanks Emanuel for your review and comments! ------------- PR Review: https://git.openjdk.org/jdk/pull/23823#pullrequestreview-2710768924 From chagedorn at openjdk.org Mon Mar 24 15:34:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Mar 2025 15:34:15 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Mon, 24 Mar 2025 12:25:29 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1706: >> >>> 1704: // Compute the value of the loop induction variable at the end of the >>> 1705: // first iteration of the unrolled loop: init + new_stride_con - init_inc >>> 1706: int unrolled_stride_con = stride_con_before_unroll * 2; >> >> Could we assert that `stride_con_before_unroll == main_loop_head->stride_con()`? > > If not, could we assert something similar? I thought about somehow asserting here that as well. But the problem is that at this point, we already concatenated the original and the new loop together to represent one round of unrolling. So, we do not find the original loop exit check anymore from which we could have read the stride. That's why I explicitly take the cached `stride_con_before_unroll` and double it here. We could have maybe cached the original loop exit node somehow to query it. But I don't think it adds much value since it's as good the original stride which was read from the loop exit node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2010417724 From chagedorn at openjdk.org Mon Mar 24 15:34:18 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Mar 2025 15:34:18 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: <3vNSau6-nBLDkGGPoq2ijj4lSKAlB4KQfW-Ys3heuTA=.98670082-0972-4921-8991-a9dfa39cfa14@github.com> On Mon, 24 Mar 2025 12:33:33 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Small things >> - Fix test comments >> - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. >> - Merge branch 'master' into JDK-8350579 >> - Revert fix completely >> - 8350579: Remove Template Assertion Predicates belonging to a >> loop once it is folded away during IGVN > > src/hotspot/share/opto/predicates.cpp line 1267: > >> 1265: template_assertion_predicate.opaque_node()->mark_useful(); >> 1266: } >> 1267: } > > I'm not sure if it makes to split this into two methods, but that's subjective ? > > It seems to me that the code in `visit` is an optimization for what happens in `mark_template_useful_if_matching_loop`, and does not really make sense on its own. The reasons I've split it is the following: - The bailout for non-counted loops is actually separate to the marking. So I have a two-step algorithm: bailout + marking which can nicely be split. - Having `mark_template_useful_if_matching_loop()` allows me to quickly read `visit()` and understand what's going on. Additionally, I can put the details about why we do the marking at the method comment for more interested code readers. Without the extracted method, I would probably need to put an extra "mark template useful if matching loop" comment + the 6 lines of comments at `mark_template_useful_if_matching_loop()` into the `visit()` method which makes it harder to grasp. I would prefer to stick to what I have now - but I admit it's a subjective matter :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2010396527 From dlunden at openjdk.org Mon Mar 24 15:38:18 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 24 Mar 2025 15:38:18 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v10] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:40:57 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove accidental leftover #endif > > src/hotspot/share/opto/regmask.hpp line 188: > >> 186: // all registers/stack locations under _lwm and over _hwm are excluded. >> 187: // The exception is (s10, s11, ...), where the value is decided solely by >> 188: // _all_stack, regardless of the value of _hwm. > > This comment illustrates the case with `_offset = 0`, I think it would be useful to extend it with an example where `_offset > 0`. Here is a suggestion: https://github.com/openjdk/jdk/commit/8377012ac485a70703921822d58bc535bafb7a0a. Feel free to merge as-is or edit to your liking, if you agree. Looks good to me, now merged. There are likely other opportunities for more source code comment illustrations throughout `regmask.hpp`. `SUBTRACT_inner` and `overlap` comes to mind, in particular. I'll have a look and see what can be improved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2010426174 From sparasa at openjdk.org Mon Mar 24 15:41:12 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 24 Mar 2025 15:41:12 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 20:29:41 GMT, Srinivas Vamsi Parasa wrote: >>> > @vamsi-parasa I tried to launch testing, but my script fails because of some merge issue. Would you mind merging from master? >>> >>> Hi Emanuel (@eme64), please see the updated code after the merge with master. >> >> Hi Emanuel (@eme64), could you please let me know if you're still seeing script failure? > >> @vamsi-parasa I launched testing now. Please ping me after the weekend for results :) > > Thank you, Emanuel! > @vamsi-parasa Testing looks good / no related test failures. Thank you very much Emanuel! (@eme64) :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2748562878 From epeter at openjdk.org Mon Mar 24 15:44:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 15:44:16 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v42] In-Reply-To: References: Message-ID: <9V-aL5zZgNXWDXlHl3QB8brQGPhIKRmX7kXdlp2Z6lo=.cb88be68-f15d-4fd9-a4e0-be7952731e2f@github.com> On Mon, 24 Mar 2025 15:39:31 GMT, Emanuel Peter wrote: >> Johannes Graham has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: >> >> - Merge branch 'openjdk:master' into xor_const >> - Merge branch 'openjdk:master' into xor_const >> - invert comparison in tests >> - update bug numbers and summary >> - add test of random ranges >> - consistency >> - Merge branch 'openjdk:master' into xor_const >> - widen range of test values; add missing comment >> - a few more tests >> - add comments >> >> Co-authored-by: Emanuel Peter >> - ... and 47 more: https://git.openjdk.org/jdk/compare/02a4ce23...06537f21 > > test/hotspot/jtreg/compiler/c2/irTests/XorINodeIdealizationTests.java line 385: > >> 383: public int testRandomLimits(int x, int y) { >> 384: x = RANGE_1.clamp(x); >> 385: y = RANGE_2.clamp(y); > > Question: > did you verify that this `RANGE_1` with its clamp values are really detected as constants by C2, and not seen as loads? Maybe that works, but I've never tried it myself. I just hope that the abstraction here does not invalidate our intent to have constant min/max bounds for the clamping ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23089#discussion_r2010435990 From epeter at openjdk.org Mon Mar 24 15:44:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 15:44:16 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v42] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:28:52 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. >> >> --------- >> ### Progress >> - [x] Change must not contain extraneous whitespace >> - [x] Commit message must refer to an issue >> - [ ] Change must be properly reviewed (2 reviews required, with at least 2 [Reviewers](https://openjdk.org/bylaws#reviewer)) >> >> >> >> ### Reviewers >> * [Quan Anh Mai](https://openjdk.org/census#qamai) (@merykitty - Committer) ? Re-review required (review applies to [cf779497](https://git.openjdk.org/jdk/pull/23089/files/cf77949776f7a4601268c7291a5743c2eb164186)) >> >> ### Reviewing >>
Using git >> >> Checkout this PR locally: \ >> `$ git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089` \ >> `$ git checkout pull/23089` >> >> Update a local copy of the PR: \ >> `$ git checkout pull/23089` \ >> `$ git pull https://git.openjdk.org/jdk.git pull/23089/head` >> >>
>>
Using Skara CLI tools >> >> Checkout this PR locally: \ >> `$ git pr checkout 23089` >> >> View PR using the GUI difftool: \ >> `$ git pr show -t 23089` >> >>
>>
Using diff file >> >> Download this PR as a diff file: \ >> https://git.openjdk.org/jdk/pull/23089.diff >> >>
>>
Using Webrev >> >> [Link to Webrev Comment](https://git.openjdk.org/jdk/pull/23089#issuecomment-25939... > > Johannes Graham has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: > > - Merge branch 'openjdk:master' into xor_const > - Merge branch 'openjdk:master' into xor_const > - invert comparison in tests > - update bug numbers and summary > - add test of random ranges > - consistency > - Merge branch 'openjdk:master' into xor_const > - widen range of test values; add missing comment > - a few more tests > - add comments > > Co-authored-by: Emanuel Peter > - ... and 47 more: https://git.openjdk.org/jdk/compare/02a4ce23...06537f21 test/hotspot/jtreg/compiler/c2/irTests/XorINodeIdealizationTests.java line 385: > 383: public int testRandomLimits(int x, int y) { > 384: x = RANGE_1.clamp(x); > 385: y = RANGE_2.clamp(y); Question: did you verify that this `RANGE_1` with its clamp values are really detected as constants by C2, and not seen as loads? Maybe that works, but I've never tried it myself. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23089#discussion_r2010433450 From sparasa at openjdk.org Mon Mar 24 15:47:37 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 24 Mar 2025 15:47:37 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v15] In-Reply-To: References: Message-ID: > The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. > > For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update copyright for x86-asmtest.py ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/e9369a40..87d08a08 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=13-14 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From dfenacci at openjdk.org Mon Mar 24 15:48:20 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Mar 2025 15:48:20 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v5] In-Reply-To: References: Message-ID: <1jAnOoibn5X9PMRRgBSRtl8ECJyvh7i3EYTvf7adKow=.d5964399-918b-4529-94c5-9d93d5b2e8f1@github.com> On Fri, 21 Mar 2025 14:07:45 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > -XX:+TraceLoopOpts fix Thanks a lot for fixing this @rwestrel! ------------- PR Review: https://git.openjdk.org/jdk/pull/23465#pullrequestreview-2710835894 From dfenacci at openjdk.org Mon Mar 24 15:48:21 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Mar 2025 15:48:21 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v5] In-Reply-To: References: <4hKl3zRJ6EP4QA-iuKiEpdwIqFk2-YvrpixAGy_VidU=.e9490e22-7751-41a6-a3e7-202930be570a@github.com> Message-ID: <5oGeRDTLETGizI0hd14DBW5z3qoi7-IaI_3ESDhBH2c=.2dd179fe-5f0d-4e93-a452-aa50ebd29c68@github.com> On Mon, 24 Mar 2025 12:01:30 GMT, Christian Hagedorn wrote: >> ` ArrayCopyNode::may_modify()` performs some pattern matching and needs to be in sync with the shape of the array copy once expanded. If that shape changes then ` ArrayCopyNode::may_modify()` needs to be adjusted. The code you points to was added when the shape of the expanded array copy was changed to avoid a complicated update to the pattern matching in ` ArrayCopyNode::may_modify()`. What I propose is to get rid of the pattern matching because it's fragile and to instead always use the trick from that change where the final `MemBarNode` is marked, so make it unconditional. > > Makes sense, thanks for the explanation! Do we still need `is_partial_array_copy` in production builds? It seems to be used only in an assertion block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2010436381 From epeter at openjdk.org Mon Mar 24 15:52:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 15:52:28 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v27] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 13:35:02 GMT, Johannes Graham wrote: >> I also see that https://github.com/openjdk/jdk/pull/2776 and https://github.com/openjdk/jdk/pull/4136 were mentioned here. Both of those are related an have no IR tests of their own, yikes! We have to ensure that we cover those old cases, and then new ones here, so that we do not get any accidental regressions. >> >> Maybe that's all already covered in other existing tests or the tests you added. Can you please provide a summary of all tests and what cases they cover in the PR description? It would help a lot for reviewing. > > Hi @eme64, do you have any more recommendations on this? @j3graham something funny happened with the PR description, you may want to fix that: ![image](https://github.com/user-attachments/assets/b25b6a7c-24f6-4e90-a220-5ed94ce06f8b) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2748595447 From epeter at openjdk.org Mon Mar 24 15:52:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 15:52:28 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v42] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:28:52 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. >> >> --------- >> ### Progress >> - [x] Change must not contain extraneous whitespace >> - [x] Commit message must refer to an issue >> - [ ] Change must be properly reviewed (2 reviews required, with at least 2 [Reviewers](https://openjdk.org/bylaws#reviewer)) >> >> >> >> ### Reviewers >> * [Quan Anh Mai](https://openjdk.org/census#qamai) (@merykitty - Committer) ? Re-review required (review applies to [cf779497](https://git.openjdk.org/jdk/pull/23089/files/cf77949776f7a4601268c7291a5743c2eb164186)) >> >> ### Reviewing >>
Using git >> >> Checkout this PR locally: \ >> `$ git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089` \ >> `$ git checkout pull/23089` >> >> Update a local copy of the PR: \ >> `$ git checkout pull/23089` \ >> `$ git pull https://git.openjdk.org/jdk.git pull/23089/head` >> >>
>>
Using Skara CLI tools >> >> Checkout this PR locally: \ >> `$ git pr checkout 23089` >> >> View PR using the GUI difftool: \ >> `$ git pr show -t 23089` >> >>
>>
Using diff file >> >> Download this PR as a diff file: \ >> https://git.openjdk.org/jdk/pull/23089.diff >> >>
>>
Using Webrev >> >> [Link to Webrev Comment](https://git.openjdk.org/jdk/pull/23089#issuecomment-25939... > > Johannes Graham has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: > > - Merge branch 'openjdk:master' into xor_const > - Merge branch 'openjdk:master' into xor_const > - invert comparison in tests > - update bug numbers and summary > - add test of random ranges > - consistency > - Merge branch 'openjdk:master' into xor_const > - widen range of test values; add missing comment > - a few more tests > - add comments > > Co-authored-by: Emanuel Peter > - ... and 47 more: https://git.openjdk.org/jdk/compare/02a4ce23...06537f21 Thanks for the updates @j3graham ! Could we have the same tests for long as for int? I see that the `testRandomLimits` only exists for int right now. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23089#pullrequestreview-2710852134 From sparasa at openjdk.org Mon Mar 24 15:53:12 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 24 Mar 2025 15:53:12 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v14] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 10:26:13 GMT, Emanuel Peter wrote: > I cannot really review he content, nor can I really test the APX features. But I also cannot directly see any issues with it. > > Therefore, I'll approve. Once the hardware is available, we will probably discover some issues and have to fix them then. > > Thanks @vamsi-parasa for the work! Thank You Emanuel(@eme64) for doing the review, testing and approving the PR! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2748602739 From epeter at openjdk.org Mon Mar 24 15:57:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 15:57:26 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v27] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 13:35:02 GMT, Johannes Graham wrote: >> I also see that https://github.com/openjdk/jdk/pull/2776 and https://github.com/openjdk/jdk/pull/4136 were mentioned here. Both of those are related an have no IR tests of their own, yikes! We have to ensure that we cover those old cases, and then new ones here, so that we do not get any accidental regressions. >> >> Maybe that's all already covered in other existing tests or the tests you added. Can you please provide a summary of all tests and what cases they cover in the PR description? It would help a lot for reviewing. > > Hi @eme64, do you have any more recommendations on this? @j3graham I think the VM changes look good, and the tests are almost there. So I launched some testing. Please ping me in a day for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2748616671 From sparasa at openjdk.org Mon Mar 24 16:02:46 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 24 Mar 2025 16:02:46 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v16] In-Reply-To: References: Message-ID: > The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. > > For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update copyright years in test/hotspot/gtest/x86/x86-asmtest.py Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23501/files - new: https://git.openjdk.org/jdk/pull/23501/files/87d08a08..06c52ce3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23501&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501 PR: https://git.openjdk.org/jdk/pull/23501 From epeter at openjdk.org Mon Mar 24 16:02:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 16:02:46 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v16] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 16:00:05 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright years in test/hotspot/gtest/x86/x86-asmtest.py > > Co-authored-by: Emanuel Peter Re-approved :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23501#pullrequestreview-2710905459 From epeter at openjdk.org Mon Mar 24 16:02:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 16:02:46 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v15] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 15:47:37 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright for x86-asmtest.py test/hotspot/gtest/x86/x86-asmtest.py line 1: > 1: # Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. Suggestion: # Copyright (c) 2024, 2025, Oracle and/or its affiliates. All rights reserved. We update the copyright like this ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23501#discussion_r2010467195 From sparasa at openjdk.org Mon Mar 24 16:05:23 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 24 Mar 2025 16:05:23 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v16] In-Reply-To: References: Message-ID: <0c36D3W-1GHo4gbSmHye1tJKsDxJLjG3uxt0Hb0Qxpo=.6ce172e4-2e95-44d3-a293-315dcb8e64a5@github.com> On Mon, 24 Mar 2025 16:00:04 GMT, Emanuel Peter wrote: > Re-approved :) Thank you, Emanuel! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2748639781 From epeter at openjdk.org Mon Mar 24 16:19:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 16:19:19 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v3] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:13:04 GMT, Hannes Greule wrote: >> Nice idea! Thanks for the work :) > > @eme64 could you take another look? Thanks! @SirYwell The code now looks really good, I launched some tests. Please ping me again in a day for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23851#issuecomment-2748688541 From duke at openjdk.org Mon Mar 24 16:23:16 2025 From: duke at openjdk.org (Vixea) Date: Mon, 24 Mar 2025 16:23:16 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: <6shrH4rG_Ne1r7q99ptFITsnkZpo89ShLiXfOTkcfG0=.186f153b-7539-48d5-bb21-dce816f7f9f5@github.com> References: <6Vx1NT54rOWXHIAj1Ug2Ike3A7bTmIYSOGzp09GA0pg=.1ccd1d10-fed8-4194-942d-a25d7c39b68b@github.com> <6shrH4rG_Ne1r7q99ptFITsnkZpo89ShLiXfOTkcfG0=.186f153b-7539-48d5-bb21-dce816f7f9f5@github.com> Message-ID: On Mon, 24 Mar 2025 09:33:29 GMT, Robbin Ehn wrote: >>> better to have seperate match rules. >> >> We could also do the similar thing to `CMoveX`, for this part I can do it in a separate PR together if this one is accecpted. > > Yes, good point, I think we need to update the cost for some others as well. > We can do that separately. I'm pretty sure the p550(hifive primier p550) doesn't support zicond but does support zbb, zba ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24153#discussion_r2010504437 From duke at openjdk.org Mon Mar 24 16:23:16 2025 From: duke at openjdk.org (Vixea) Date: Mon, 24 Mar 2025 16:23:16 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: References: <6Vx1NT54rOWXHIAj1Ug2Ike3A7bTmIYSOGzp09GA0pg=.1ccd1d10-fed8-4194-942d-a25d7c39b68b@github.com> <6shrH4rG_Ne1r7q99ptFITsnkZpo89ShLiXfOTkcfG0=.186f153b-7539-48d5-bb21-dce816f7f9f5@github.com> Message-ID: <7Gpjkj3ilwHM-qjrinG_woBi-jwwjRVQS1zOjYNMVxQ=.e514a754-07c6-4a5f-bbe8-5cb23b28520f@github.com> On Mon, 24 Mar 2025 16:16:11 GMT, Vixea wrote: >> Yes, good point, I think we need to update the cost for some others as well. >> We can do that separately. > > I'm pretty sure the p550(hifive primier p550) doesn't support zicond but does support zbb, zba Umm anyway the p550(hifive primier and megrez don't support zicond but do support zbb/zba. https://github.com/llvm/llvm-project/commit/5d03235c73476dfa3d2dd48c76de106fd1aa2ac7 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24153#discussion_r2010510035 From duke at openjdk.org Mon Mar 24 16:27:11 2025 From: duke at openjdk.org (Vivek Deshpande) Date: Mon, 24 Mar 2025 16:27:11 GMT Subject: RFR: 8350609: Cleanup unknown unwind opcode (0xB) for windows In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 03:58:17 GMT, Dhamoder Nalla wrote: > This PR is to clean-up unknown unwind opcodes (0xB) in Windows intrinsic functions introduced in commit https://github.com/openjdk/jdk17u-dev/commit/9f05c411e6d6bdf612cf0cf8b9fe4ca9ecde50d1#diff-a024df6bcd94607260545e647922261703a652dee1afadb1fa758f6e74a568d1 > > ![image](https://github.com/user-attachments/assets/5b295365-ba8e-4fd6-8b8b-f7243f80a496) > > According to the Windows unwind Opcodes outlined at https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170#unwind-operation-code, the opcode 0xB (1011) is not a valid Opcode, as the valid opcodes range from 0 to 10. > > Test performed: > 1. tier1 tests > 2. Vector tests under /test/jdk/incubator/vector Marked as reviewed by vivdesh at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/23707#pullrequestreview-2710986278 From duke at openjdk.org Mon Mar 24 16:27:12 2025 From: duke at openjdk.org (Vivek Deshpande) Date: Mon, 24 Mar 2025 16:27:12 GMT Subject: RFR: 8350609: Cleanup unknown unwind opcode (0xB) for windows In-Reply-To: References: <4BUZBMlLC1zPnsieDNsbkji0xcmHLd_VHRN3bEhpJ3A=.5d2b315c-20de-4770-93e1-846cd733cde0@github.com> Message-ID: <7C06TOfg78KUXEq1JimVQvutBmim1Qq8ev8JN-YsDlo=.f8488a31-b3bc-4937-acfc-0425818dd77a@github.com> On Fri, 21 Mar 2025 18:05:54 GMT, Dhamoder Nalla wrote: >>> @dhanalla As @vivdesh asked above: do you have a regression test for this? >>> >>> You also have this warning above: >>> >>> Warning ?? Found leading lowercase letter in issue title for 8350609: cleanup unknown unwind opcode (0xB) for windows >> I addressed the warning, and regarding regression tests, I have validated the Jtreg Tier 1 tests, including vector-specific tests under /test/jdk/incubator/vector. > >> @dhanalla Thanks for fixing the waring and running some tests! >> >> From my understanding, those tests passed before your patch here, correct? If so, then I'm wondering if there could be a regression test for this "unknown unwind opcode", that fails before your patch and passes with your patch? How feasible is that? > > Thanks @eme64, > We are just cleaning up the unknown unwind codes that are not required. The unwind instructions in the .text section remain untouched. > The only difference we see after this change is that the output of 'dumpbin.exe /unwindinfo jsvml.dll' will no longer display any unknown unwind opcodes identified in the DLL corresponding to these methods. Thanks @dhanalla. Looks good to me as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23707#issuecomment-2748708478 From epeter at openjdk.org Mon Mar 24 16:29:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Mar 2025 16:29:17 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v6] In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 17:55:27 GMT, Jatin Bhateja wrote: >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 >> >> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). >> >> Summary of changes included with the patch: >> 1. C2 compiler New Vector IR creation. >> 2. Auto-vectorization support. >> 3. x86 backend implementation. >> 4. New IR verification test for each newly supported vector operation. >> >> Following are the performance numbers of Float16OperationsBenchmark >> >> System : Intel(R) Xeon(R) Processor code-named Granite rapids >> Frequency fixed at 2.5 GHz >> >> >> Baseline >> Benchmark (vectorDim) Mode Cnt Score Error Units >> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms >> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms >> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms >> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms >> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms >> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms >> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms >> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms >> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms >> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms >> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms >> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms >> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms >> Float16OperationsBenchmark.isInfiniteCMovBen... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Removing Generator dependency on incubation module Quickly scanned the non-x64 VM changes, and it looks reasonable. I was wondering if you want to handle Float64 reductions as well though... actually that may be better to do in a separate PR. src/hotspot/share/opto/vectornode.cpp line 1023: > 1021: VectorNode* VectorReinterpretNode::make(Node* n, const TypeVect* dst_vt, const TypeVect* src_vt) { > 1022: return new VectorReinterpretNode(n, dst_vt, src_vt); > 1023: } This seems like an unnecessary redirection... the arguments and output is the same. Do we need it? ------------- PR Review: https://git.openjdk.org/jdk/pull/22755#pullrequestreview-2710984004 PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2010517584 From duke at openjdk.org Mon Mar 24 16:44:22 2025 From: duke at openjdk.org (duke) Date: Mon, 24 Mar 2025 16:44:22 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v16] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 16:02:46 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. >> >> For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright years in test/hotspot/gtest/x86/x86-asmtest.py > > Co-authored-by: Emanuel Peter @vamsi-parasa Your change (at version 06c52ce3db3a426d2d3d020a884609f81a9bad5a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2748759246 From sparasa at openjdk.org Mon Mar 24 16:47:20 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 24 Mar 2025 16:47:20 GMT Subject: Integrated: 8349582: APX NDD code generation for OpenJDK In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 20:26:49 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding. > > For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. This pull request has now been integrated. Changeset: c87e1be0 Author: Srinivas Vamsi Parasa URL: https://git.openjdk.org/jdk/commit/c87e1be0526fdd656bf0601542db6b92ccea567f Stats: 3625 lines in 5 files changed: 2018 ins; 52 del; 1555 mod 8349582: APX NDD code generation for OpenJDK Reviewed-by: epeter, jbhateja, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/23501 From sparasa at openjdk.org Mon Mar 24 16:57:17 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 24 Mar 2025 16:57:17 GMT Subject: RFR: 8348638: Performance regression in Math.tanh In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:44:32 GMT, Mohamed Issa wrote: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. > > 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. > 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 20**. This new endpoint is the closest value above the minimum (**55 * ln(2) / 2**) required for correctness that's possible when only retrieving the topmost word of the input register. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v24-b33](https://github.com/openjdk/jdk/releases/tag/jdk-24%2B33) as the baseline version. > > For performance data collected with the regression micro-benchmark referenced in the bug report, see the table below. Each result is the mean of 3 individual runs. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly improve execution times to the point where are almost at parity with the baseline. Also, there is almost no impact to the low value (1, 2) scenarios. > > | Input range (+/-) | Baseline (ms) | No fix (ms) | With fix (ms) | No fix vs baseline (%) | Fix vs baseline (%) | > | :------------------: | :-------------: | :-----------: | :-------------: | :----------------------: | :-------------------: | > | 1 | 1842 | 1961 | 1969 | +6.46 | +6.89 | > | 2 | 2102 | 2010 | 1998 | -4.38 | -4.95 | > | 100 | 801 | 1018 | 716 | +27.09 | -10.61 | > | 1000 | 498 | 803 | 519 | +61.24 | +4.22 | > | 10000 | 474 | 755 | 491 | +59.28 | +3.59 | > | 100000 | 473 | 758 | 491 | +60.25 | +3.81 ... src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 73: > 71: // Special cases: > 72: // tanh(NaN) = quiet NaN, and raise invalid exception > 73: // tanh(+/-INF) = +/-1 Thanks for fixing this. Yudi Zheng (@mur47x111) posted a comment about this on my PR after it was integrated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2010579208 From kvn at openjdk.org Mon Mar 24 17:16:22 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Mar 2025 17:16:22 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v4] In-Reply-To: References: Message-ID: <1_pSpJqymSS7Cmxl8RUbULH2toJC5A_HbbRLyXTpTnc=.5cd6f2fc-d4ef-4507-b2b4-2d215c3955fa@github.com> On Mon, 24 Mar 2025 12:39:56 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. >> >> This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. >> >> As a result, EA complains about a strange memory graph. >> >> Testing: jdk_jfr >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > fold Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24192#pullrequestreview-2711139109 From duke at openjdk.org Mon Mar 24 17:18:13 2025 From: duke at openjdk.org (duke) Date: Mon, 24 Mar 2025 17:18:13 GMT Subject: RFR: 8350609: Cleanup unknown unwind opcode (0xB) for windows In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 03:58:17 GMT, Dhamoder Nalla wrote: > This PR is to clean-up unknown unwind opcodes (0xB) in Windows intrinsic functions introduced in commit https://github.com/openjdk/jdk17u-dev/commit/9f05c411e6d6bdf612cf0cf8b9fe4ca9ecde50d1#diff-a024df6bcd94607260545e647922261703a652dee1afadb1fa758f6e74a568d1 > > ![image](https://github.com/user-attachments/assets/5b295365-ba8e-4fd6-8b8b-f7243f80a496) > > According to the Windows unwind Opcodes outlined at https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170#unwind-operation-code, the opcode 0xB (1011) is not a valid Opcode, as the valid opcodes range from 0 to 10. > > Test performed: > 1. tier1 tests > 2. Vector tests under /test/jdk/incubator/vector @dhanalla Your change (at version 8d4eee1452da34391059fd2757dd97b895037eea) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23707#issuecomment-2748859286 From eastigeevich at openjdk.org Mon Mar 24 17:28:11 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 24 Mar 2025 17:28:11 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v6] In-Reply-To: References: <8BMaz7Ssnwk6ywuPYfgaL1VSN7XZYBzpvMWRsG12-Eg=.441d03b6-4a0b-479b-b789-d9bfd03bd346@github.com> <0DkMl4UCWkNIUpFJKP2z4T-nP0NjJsrhbuczJLeWVHM=.64561b36-8472-4159-8d90-7cbec2b58d5d@github.com> Message-ID: On Tue, 18 Mar 2025 19:59:31 GMT, Vladimir Kozlov wrote: >> Hi @dean-long, >> I see two changes can be made in nmethod: >> 1. Call sites are patched because of changes in callees. >> 2. Oops used in nmethod are updated. >> >> The first change should not be a problem if we clear all call site. It's already done in the current code by clearing inline caches of the original nmethod. As discussed, we should not clear inline caches of the original but the copy. >> >> How the second change (oops) is addressed when we create a new nmethod? > >> How the second change (oops) is addressed when we create a new nmethod > > Called from `nmethod()` constructor [nmethod::copy_values()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1741) replaces handles used during compilation with oops. > > Note, `new_nmethod()` holds `CodeCache_lock` and `ciEnv::register_method()` holds `Compile_lock` (and `MethodCompileQueue_lock`). @vnkozlov @dean-long It looks like the only way relocation can be performed correctly only at a safepoint. GC updating oops concurrently with relocation is an issue. The safepoint requirement will limit use cases of relocation. Relocations should not be done often and should be done in a batch. On another side, relocating at a safepoint will simplify clone code patching. We need to fix offsets in call instructions at call sites. We don't need to clear them. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2010629936 From dhanalla at openjdk.org Mon Mar 24 17:43:19 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Mon, 24 Mar 2025 17:43:19 GMT Subject: Integrated: 8350609: Cleanup unknown unwind opcode (0xB) for windows In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 03:58:17 GMT, Dhamoder Nalla wrote: > This PR is to clean-up unknown unwind opcodes (0xB) in Windows intrinsic functions introduced in commit https://github.com/openjdk/jdk17u-dev/commit/9f05c411e6d6bdf612cf0cf8b9fe4ca9ecde50d1#diff-a024df6bcd94607260545e647922261703a652dee1afadb1fa758f6e74a568d1 > > ![image](https://github.com/user-attachments/assets/5b295365-ba8e-4fd6-8b8b-f7243f80a496) > > According to the Windows unwind Opcodes outlined at https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170#unwind-operation-code, the opcode 0xB (1011) is not a valid Opcode, as the valid opcodes range from 0 to 10. > > Test performed: > 1. tier1 tests > 2. Vector tests under /test/jdk/incubator/vector This pull request has now been integrated. Changeset: a54445f7 Author: Dhamoder Nalla Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/a54445f789c7e37c03b28e07a7fdaa83672e3edc Stats: 112 lines in 22 files changed: 0 ins; 88 del; 24 mod 8350609: Cleanup unknown unwind opcode (0xB) for windows Reviewed-by: sviswanathan, epeter ------------- PR: https://git.openjdk.org/jdk/pull/23707 From kvn at openjdk.org Mon Mar 24 17:54:15 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Mar 2025 17:54:15 GMT Subject: RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 08:22:57 GMT, Emanuel Peter wrote: > This was a fuzzer failure, which hit an assert in SuperWord: > > `# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop` > > We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible. > > I'm proposing two changes in this PR: > - We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop. > - In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`. > > I added 2 tests: > - The fuzzer generated test that hits the assert before this patch. > - An IR test that checks that we do not multiversion in a `PeelMainPost` loop case. > > --------------- > > **FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example: > > `test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java` > > With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` > > > Counted Loop: N537/N176 counted [int,100),+1 (-1 iters) > Loop: N0/N0 has_sfpt > Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 } > Loop: N536/N535 > Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined > Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt > Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt > Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined > PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined > Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined > Poor node estimate: 306 >> 92 > Loop: N0/N0 has_sfpt > Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 } > Loop: N556/N557 sfpts={ 559 } > Loop: N552/N554 counted... Looks reasonable. > If reviewers thing this really should be investigated, I could file a follow-up RFE. Yes, please. Can you check without "strip minning" if we can eliminate this loop? ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24183#pullrequestreview-2711259366 PR Comment: https://git.openjdk.org/jdk/pull/24183#issuecomment-2748962544 From duke at openjdk.org Mon Mar 24 17:54:20 2025 From: duke at openjdk.org (Johannes Graham) Date: Mon, 24 Mar 2025 17:54:20 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v42] In-Reply-To: <9V-aL5zZgNXWDXlHl3QB8brQGPhIKRmX7kXdlp2Z6lo=.cb88be68-f15d-4fd9-a4e0-be7952731e2f@github.com> References: <9V-aL5zZgNXWDXlHl3QB8brQGPhIKRmX7kXdlp2Z6lo=.cb88be68-f15d-4fd9-a4e0-be7952731e2f@github.com> Message-ID: On Mon, 24 Mar 2025 15:40:53 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/irTests/XorINodeIdealizationTests.java line 385: >> >>> 383: public int testRandomLimits(int x, int y) { >>> 384: x = RANGE_1.clamp(x); >>> 385: y = RANGE_2.clamp(y); >> >> Question: >> did you verify that this `RANGE_1` with its clamp values are really detected as constants by C2, and not seen as loads? Maybe that works, but I've never tried it myself. > > I just hope that the abstraction here does not invalidate our intent to have constant min/max bounds for the clamping ;) It does the right thing with the constants - `testFoldableRange` would fail otherwise. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23089#discussion_r2010671641 From mli at openjdk.org Mon Mar 24 18:16:16 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 24 Mar 2025 18:16:16 GMT Subject: Integrated: 8352615: [Test] RISC-V: TestVectorizationMultiInvar.java fails on riscv64 without rvv support In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 14:59:21 GMT, Hamlin Li wrote: > Hi, > Can you help to review this trivial patch? > TestVectorizationMultiInvar.java fails on riscv if rvv is not support, as it will verify the `MaxVectorSize > 0` in test framework. > > Thanks! This pull request has now been integrated. Changeset: b84b2927 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/b84b29278f710fabab703bc75dda1fa817bc13f6 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8352615: [Test] RISC-V: TestVectorizationMultiInvar.java fails on riscv64 without rvv support Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/24157 From duke at openjdk.org Mon Mar 24 18:17:01 2025 From: duke at openjdk.org (Johannes Graham) Date: Mon, 24 Mar 2025 18:17:01 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v43] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed (2 reviews required, with at least 2 [Reviewers](https://openjdk.org/bylaws#reviewer)) > > > > ### Reviewers > * [Quan Anh Mai](https://openjdk.org/census#qamai) (@merykitty - Committer) ? Re-review required (review applies to [cf779497](https://git.openjdk.org/jdk/pull/23089/files/cf77949776f7a4601268c7291a5743c2eb164186)) > > ### Reviewing >
Using git > > Checkout this PR locally: \ > `$ git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089` \ > `$ git checkout pull/23089` > > Update a local copy of the PR: \ > `$ git checkout pull/23089` \ > `$ git pull https://git.openjdk.org/jdk.git pull/23089/head` > >
>
Using Skara CLI tools > > Checkout this PR locally: \ > `$ git pr checkout 23089` > > View PR using the GUI difftool: \ > `$ git pr show -t 23089` > >
>
Using diff file > > Download this PR as a diff file: \ > https://git.openjdk.org/jdk/pull/23089.diff > >
>
Using Webrev > > [Link to Webrev Comment](https://git.openjdk.org/jdk/pull/23089#issuecomment-2593992282) >
Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: Add random range tests for Long ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/06537f21..226fa3bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=41-42 Stats: 103 lines in 2 files changed: 89 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From kvn at openjdk.org Mon Mar 24 18:22:09 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Mar 2025 18:22:09 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:18:25 GMT, Afshin Zafari wrote: > The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift. Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. > > Tests: > linux-x64-debug tier1 on a UBSAN enabled build. Please update bug report and PR's description with error output you see. The offset can't be big number since internal_word relocation is referencing address in the same nmethod. Unless there is bug. Offset can be negative. Why it is UB for signed left shift for negative value? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2749033445 From duke at openjdk.org Mon Mar 24 18:23:13 2025 From: duke at openjdk.org (Johannes Graham) Date: Mon, 24 Mar 2025 18:23:13 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v43] In-Reply-To: References: Message-ID: <60TmZGna-1uQ_duA527D1KBq6MjGJyWrx3vdler6tE4=.bac8d74b-7713-4da8-8ea2-71b497d03016@github.com> On Mon, 24 Mar 2025 18:17:01 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > Add random range tests for Long Fixed the description and added randomized tests for Long. Happily min/max over Long is now optimized! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2749037551 From duke at openjdk.org Mon Mar 24 18:29:34 2025 From: duke at openjdk.org (Johannes Graham) Date: Mon, 24 Mar 2025 18:29:34 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v44] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: Undo accidental changes to Int tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/226fa3bf..8ca86fda Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=42-43 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From mgronlun at openjdk.org Mon Mar 24 19:23:56 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 19:23:56 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v5] In-Reply-To: References: Message-ID: > Greetings, > > In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. > > This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. > > As a result, EA complains about a strange memory graph. > > Testing: jdk_jfr > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: simplified test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24192/files - new: https://git.openjdk.org/jdk/pull/24192/files/c63fb608..cae054c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24192&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24192&range=03-04 Stats: 38 lines in 1 file changed: 4 ins; 25 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24192/head:pull/24192 PR: https://git.openjdk.org/jdk/pull/24192 From chagedorn at openjdk.org Mon Mar 24 20:47:16 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Mar 2025 20:47:16 GMT Subject: RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 08:22:57 GMT, Emanuel Peter wrote: > This was a fuzzer failure, which hit an assert in SuperWord: > > `# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop` > > We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible. > > I'm proposing two changes in this PR: > - We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop. > - In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`. > > I added 2 tests: > - The fuzzer generated test that hits the assert before this patch. > - An IR test that checks that we do not multiversion in a `PeelMainPost` loop case. > > --------------- > > **FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example: > > `test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java` > > With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` > > > Counted Loop: N537/N176 counted [int,100),+1 (-1 iters) > Loop: N0/N0 has_sfpt > Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 } > Loop: N536/N535 > Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined > Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt > Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt > Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined > PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined > Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined > Poor node estimate: 306 >> 92 > Loop: N0/N0 has_sfpt > Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 } > Loop: N556/N557 sfpts={ 559 } > Loop: N552/N554 counted... Looks good! > It seems that we are able to detect some loops as empty loops, including the pre-loop. But somhow the main-loop is not removed by "empty loop", and now this main-loop cannot traverse through the pre-loop to the multiversion_if. There are some bailouts in `IdealLoopTree::remove_main_post_loops()` (called from the empty loop removal optimization). So it seems that we cannot always eliminate the main loop when the pre loop is removed. But maybe the main loop could still be eliminated in your case somehow - might be worth investigating further. src/hotspot/share/opto/loopTransform.cpp line 3435: > 3433: if (!peel_only) { > 3434: // We are going to add pre-loop and post-loop (PreMainPost). > 3435: // But should we also multi-version for auto-vectorization speculative Just a side note: It seems that we sometimes say "multiversion" and sometimes "multi-version". I find the latter more readable. Maybe we can make it consistent at some point in a separate task. This could also mean that we have `*_multi_version_*` in method names instead of `*_multiversion*_`. test/hotspot/jtreg/compiler/loopopts/superword/TestMultiversionWithPeelMainPost.java line 31: > 29: * @summary Test case where we used to Multiversion a PeelMainPost loop, > 30: * which is useless and triggered an assert later on. > 31: * @run driver compiler.loopopts.superword.TestMultiversionWithPeelMainPost Should be `main` to be able to add flags in the CI: Suggestion: * @run main compiler.loopopts.superword.TestMultiversionWithPeelMainPost test/hotspot/jtreg/compiler/loopopts/superword/TestPeelMainPostNoMultiversioning.java line 63: > 61: long y = multiplicator; > 62: for (int i = 0; i < 10_000; i++) { > 63: x *= y; // No memory load/stroe -> PeelMainPost Suggestion: x *= y; // No memory load/store -> PeelMainPost ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24183#pullrequestreview-2711610938 PR Review Comment: https://git.openjdk.org/jdk/pull/24183#discussion_r2010888155 PR Review Comment: https://git.openjdk.org/jdk/pull/24183#discussion_r2010883383 PR Review Comment: https://git.openjdk.org/jdk/pull/24183#discussion_r2010884684 From bulasevich at openjdk.org Mon Mar 24 22:27:50 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 24 Mar 2025 22:27:50 GMT Subject: RFR: 8352426: RelocIterator should correctly handle nullptr address of relocation data Message-ID: This is a follow-up to the recent #24102 patch. It addresses an issue where RelocIterator may receive a nullptr as the relocation table address. This change can also serve as an independent fix for JDK-8352112. RelocIterator::initialize() and RelocIterator::next() perform decrement/increment operations on an internal relocaction pointer. If nm->relocation_begin() returns nullptr, this results in undefined behavior, as pointer arithmetic on nullptr is prohibited by the C++ Standard. Instead of introducing a null-check (which would add overhead in RelocIterator::next(), a performance-sensitive path), we initialize _current with a dummy static variable. This pointer is never dereferenced, so its actual value is not important - it just serves to avoid undefined behavior. RelocIterator::RelocIterator constructor can initialize _current pointer as well. However, in that place we have an assert to ensure that nullptr value is not allowed, and it seems we do not need to apply dummy value there. Testing: The fix has been verified against the failure in JDK-8352112. The issue no longer reproduces with this patch, regardless of whether the original fix from #24102 is applied. ------------- Commit messages: - 8352426: RelocIterator should correctly handle nullptr address of relocation data Changes: https://git.openjdk.org/jdk/pull/24203/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24203&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352426 Stats: 12 lines in 1 file changed: 9 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24203.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24203/head:pull/24203 PR: https://git.openjdk.org/jdk/pull/24203 From duke at openjdk.org Mon Mar 24 23:00:36 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 24 Mar 2025 23:00:36 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: - Relocate nmethod at safepoint - Fix windows build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/c8827627..a7f32409 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=05-06 Stats: 105 lines in 5 files changed: 44 ins; 46 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From sviswanathan at openjdk.org Tue Mar 25 00:32:17 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 25 Mar 2025 00:32:17 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v2] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 20:33:47 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Minor cleanup src/hotspot/cpu/x86/assembler_x86.cpp line 13758: > 13756: attributes.set_is_evex_instruction(); > 13757: attributes.set_embedded_opmask_register_specifier(mask); > 13758: attributes.reset_is_clear_context(); Why do we do reset_is_clear_context here? We want kdst bits to be set/reset and no merge context. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7093: > 7091: } > 7092: > 7093: void C2_MacroAssembler::scalar_max_min_fp16(int opcode, XMMRegister dst, XMMRegister src1, XMMRegister src2, Any reason we are not doing this on lines of scalar emit_fp_min_max? For most common cases emit_fp_min_max based sequence would have much better latency. src/hotspot/cpu/x86/x86.ad line 1466: > 1464: case Op_MaxHF: > 1465: case Op_MinHF: > 1466: if (!VM_Version::supports_avx512bw()) { This check should be supports_avx512vlbw(). The scalar_max_min_fp16 needs avx512vl as well. src/hotspot/cpu/x86/x86.ad line 1469: > 1467: return false; > 1468: } > 1469: case Op_AddHF: Please add a comment here indicating fall through. test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 33: > 31: * @library /test/lib / > 32: * @summary Add special case handling for Float16.max/min x86 backend > 33: * @requires (os.simpleArch == "x64" & vm.cpu.features ~= ".*avx512_fp16.*" & vm.cpu.features ~= ".*avx512bw.*") avx512vl is also needed here. test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 57: > 55: > 56: @Run(test = "testMaxNaNOperands") > 57: @Warmup(1000) Warmup could also be removed. test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 59: > 57: @Warmup(1000) > 58: public void launchMaxNaNOperands() { > 59: for (int i = 0; i < 10000; i++) { The loop could be removed throughout this test, don't need to test 10000 values. test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 63: > 61: RES = testMaxNaNOperands(SRC, Float16.NaN); > 62: if (!RES.equals(Float16.NaN)) { > 63: throw new AssertionError("input1 = NaN, input2 = " + SRC.floatValue() + ", expected = NaN, actual = " + RES.floatValue()); input1 is not NaN here. test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 94: > 92: RES = testMinNaNOperands(SRC, Float16.NaN); > 93: if (!RES.equals(Float16.NaN)) { > 94: throw new AssertionError("input1 = NaN, input2 = " + SRC.floatValue() + ", expected = NaN, actual = " + RES.floatValue()); input1 is not NaN here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2011090835 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010957028 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010941955 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010884350 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010958579 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2011007302 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010960699 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010960003 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010961348 From xgong at openjdk.org Tue Mar 25 01:38:17 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 25 Mar 2025 01:38:17 GMT Subject: RFR: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations [v2] In-Reply-To: <2twhpJnhbQPC7I4jJGVlawsY9EkT8ZCYwa6xUxRTUls=.81ca56f4-7534-4bab-b98b-25252c0c7977@github.com> References: <5fk0CfImI-utRMfmsA78i_uQJjoej9bsLqxsAbRZHLk=.b5aece2b-090d-4fbc-aa64-3acf8fedc41b@github.com> <8gRkivkxdlGCezJE_ZtvkO7ONzLpIpzY0PXT-6MBNI8=.9719afc9-94e9-45ed-a2d4-63e6a3593402@github.com> <2twhpJnhbQPC7I4jJGVlawsY9EkT8ZCYwa6xUxRTUls=.81ca56f4-7534-4bab-b98b-25252c0c7977@github.com> Message-ID: <-3axORZlSlICLJcD2H5_SnmZGEUbGX7yw_V3RBz73So=.0c09460c-7778-4875-9ccf-b87630755a83@github.com> On Mon, 24 Mar 2025 12:16:49 GMT, Emanuel Peter wrote: >>> @XiaohongGong Testing launched! Please ping me after the weekend for the results ;) >> >> Thanks for your testing @eme64 . May I ask about the test results please? > > Thanks @XiaohongGong For the work ? Thanks so much for your review and test @eme64 ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23608#issuecomment-2749811814 From xgong at openjdk.org Tue Mar 25 01:38:18 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 25 Mar 2025 01:38:18 GMT Subject: Integrated: 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 01:47:10 GMT, Xiaohong Gong wrote: > Since PR [1] has added several new vector operations in VectorAPI and the X86 backend implementation for them, this patch adds the AArch64 backend part for NEON/SVE architectures. > > The performance of Vector API relative JMH micro benchmarks can improve about 70x ~ 95x on a NVIDIA Grace CPU, which is a 128-bit vector length sve2 architecture, with different UseSVE options. Here is the gain details: > > > Benchmark (size) Mode Cnt -XX:UseSVE=0 -XX:UseSVE=1 -XX:UseSVE=2 > ByteMaxVector.SADD 1024 thrpt 30 80.69x 79.70x 80.534x > ByteMaxVector.SADDMasked 1024 thrpt 30 84.08x 85.72x 85.901x > ByteMaxVector.SSUB 1024 thrpt 30 80.46x 80.27x 81.063x > ByteMaxVector.SSUBMasked 1024 thrpt 30 83.96x 85.26x 85.887x > ByteMaxVector.SUADD 1024 thrpt 30 80.43x 80.36x 81.761x > ByteMaxVector.SUADDMasked 1024 thrpt 30 83.40x 84.62x 85.199x > ByteMaxVector.SUSUB 1024 thrpt 30 79.93x 79.22x 79.714x > ByteMaxVector.SUSUBMasked 1024 thrpt 30 82.93x 85.02x 84.726x > ByteMaxVector.UMAX 1024 thrpt 30 78.73x 77.39x 78.220x > ByteMaxVector.UMAXMasked 1024 thrpt 30 82.62x 84.77x 85.531x > ByteMaxVector.UMIN 1024 thrpt 30 79.04x 77.80x 78.471x > ByteMaxVector.UMINMasked 1024 thrpt 30 83.11x 84.86x 86.126x > IntMaxVector.SADD 1024 thrpt 30 83.11x 83.07x 83.183x > IntMaxVector.SADDMasked 1024 thrpt 30 90.67x 91.80x 93.162x > IntMaxVector.SSUB 1024 thrpt 30 83.37x 82.82x 83.317x > IntMaxVector.SSUBMasked 1024 thrpt 30 90.85x 92.87x 94.201x > IntMaxVector.SUADD 1024 thrpt 30 82.76x 81.78x 82.679x > IntMaxVector.SUADDMasked 1024 thrpt 30 90.49x 91.93x 93.155x > IntMaxVector.SUSUB 1024 thrpt 30 82.92x 82.34x 82.525x > IntMaxVector.SUSUBMasked 1024 thrpt 30 90.60x 92.12x 92.951x > IntMaxVector.UMAX 1024 thrpt 30 82.40x 81.85x 82.242x > IntMaxVector.UMAXMasked 1024 thrpt 30 90.30x 92.10x 92.587x > IntMaxVector.UMIN 1024 thrpt 30 82.84x 81.43x 82.801x > IntMaxVector.UMINMasked 1024 thrpt 30 90.43x 91.49x 92.678x > LongMaxVector.SADD 1024 thrpt 30 82.01x 81.74x 82.153x > LongMaxVector... This pull request has now been integrated. Changeset: ba658a71 Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/ba658a71ba4372b42a496edee55400f5014815d4 Stats: 1151 lines in 8 files changed: 674 ins; 5 del; 472 mod 8349522: AArch64: Add backend implementation for new unsigned and saturating vector operations Reviewed-by: epeter, haosun, bkilambi ------------- PR: https://git.openjdk.org/jdk/pull/23608 From bulasevich at openjdk.org Tue Mar 25 02:33:10 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 25 Mar 2025 02:33:10 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache [v2] In-Reply-To: References: Message-ID: <-Ebzc6t0saRWpkdRmuj-6kCbQSuYiN3pxnEzMheW2dE=.b5d34814-d9ab-4869-9b8a-e20f1a0ea58c@github.com> On Thu, 20 Mar 2025 13:46:06 GMT, Evgeny Astigeevich wrote: >> This benchmark is used to check performance impact of the code cache being sparse. >> >> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. >> >> Results: code region size 2M (2097152) bytes >> - Intel Xeon Platinum 8259CL >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | >> |--- |--- |--- |--- |--- |--- |--- | >> |128 |1 |128 |19.577 |0.619 |us/op | | >> |128 |32 |4 |22.968 |0.314 |us/op |17.30% | >> |128 |48 |3 |22.245 |0.388 |us/op |13.60% | >> |128 |64 |2 |23.874 |0.84 |us/op |21.90% | >> |128 |80 |2 |23.786 |0.231 |us/op |21.50% | >> |128 |96 |1 |26.224 |1.16 |us/op |34% | >> |128 |112 |1 |27.028 |0.461 |us/op |38.10% | >> |256 |1 |256 |47.43 |1.146 |us/op | | >> |256 |32 |8 |63.962 |1.671 |us/op |34.90% | >> |256 |48 |5 |63.396 |0.247 |us/op |33.70% | >> |256 |64 |4 |66.604 |2.286 |us/op |40.40% | >> |256 |80 |3 |59.746 |1.273 |us/op |26% | >> |256 |96 |3 |63.836 |1.034 |us/op |34.60% | >> |256 |112 |2 |63.538 |1.814 |us/op |34% | >> |512 |1 |512 |172.731 |4.409 |us/op | | >> |512 |32 |16 |206.772 |6.229 |us/op |19.70% | >> |512 |48 |11 |215.275 |2.228 |us/op |24.60% | >> |512 |64 |8 |212.962 |2.028 |us/op |23.30% | >> |512 |80 |6 |201.335 |12.519 |us/op |16.60% | >> |512 |96 |5 |198.133 |6.502 |us/op |14.70% | >> |512 |112 |5 |193.739 |3.812 |us/op |12.20% | >> |768 |1 |768 |325.154 |5.048 |us/op | | >> |768 |32 |24 |346.298 |20.196 |us/op |6.50% | >> |768 |48 |16 |350.746 |2.931 |us/op |7.90% | >> |768 |64 |12 |339.445 |7.927 |us/op |4.40% | >> |768 |80 |10 |347.408 |7.355 |us/op |6.80% | >> |768 |96 |8 |340.983 |3.578 |us/op |4.90% | >> |768 |112 |7 |353.949 |2.98 |us/op |8.90% | >> |1024 |1 |1024 |368.352 |5.961 |us/op | | >> |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | >> |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | >> |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | >> |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | >> |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | >> |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | >> >> - AArch64 Neoverse N1 >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff |... > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Separate active methods and method calling them with 128Mb dummy space There are results for different implementations of Neoverse V2. All three CPUs show similar performance degradation as sparsity increases (i.e., as groupCount grows). This seems to be a common feature of the Neoverse V2 architecture. Azure Cobalt also degrades more sharply as the number of active methods increases. SparseCodeCache | ? | G4 | ? | Azure Cobalt | ? | Google Axion | ? -- | -- | -- | -- | -- | -- | -- | -- activeMethodCount | groupCount | us/op | ? | us/op | ? | us/op | ? 128 | 1 | 11.972 | 0.004 | 11.092 | 0.007 | 11.201 | 0.059 128 | 32 | 13.622 | 0.092 | 15.808 | 0.779 | 11.928 | 0.013 128 | 48 | 13.217 | 0.072 | 15.937 | 0.498 | 12.126 | 0.009 128 | 64 | 13.668 | 0.04 | 16.137 | 0.517 | 12.171 | 0.139 128 | 80 | 13.986 | 0.127 | 17.681 | 0.262 | 12.525 | 0.033 128 | 96 | 14.594 | 0.055 | 18.25 | 0.795 | 12.979 | 0.051 128 | 112 | 14.77 | 0.078 | 18.529 | 1.004 | 13.129 | 0.049 256 | 1 | 23.998 | 0.019 | 22.417 | 0.006 | 22.409 | 0.003 256 | 32 | 26.273 | 0.036 | 33.329 | 0.949 | 25.097 | 0.043 256 | 48 | 26.61 | 0.063 | 34.566 | 0.343 | 24.771 | 0.118 256 | 64 | 26.959 | 0.085 | 35.953 | 0.456 | 24.443 | 0.028 256 | 80 | 27.646 | 0.089 | 38.569 | 4.495 | 25.245 | 0.027 256 | 96 | 27.829 | 0.128 | 37.749 | 0.991 | 25.536 | 0.031 256 | 112 | 28.298 | 0.064 | 40.261 | 0.155 | 25.787 | 0.016 512 | 1 | 48.181 | 0.032 | 68.768 | 0.537 | 44.863 | 0.004 512 | 32 | 53.157 | 0.044 | 94.262 | 2.801 | 50.037 | 0.038 512 | 48 | 55.13 | 0.052 | 106.928 | 3.513 | 54.611 | 0.044 512 | 64 | 56.609 | 0.123 | 103.403 | 0.708 | 53.906 | 0.039 512 | 80 | 57.146 | 0.091 | 112.929 | 2.522 | 52.923 | 0.081 512 | 96 | 59.038 | 0.092 | 141.291 | 2.346 | 56.018 | 0.054 512 | 112 | 60.647 | 0.331 | 137.491 | 11.441 | 56.705 | 0.117 768 | 1 | 77.086 | 0.402 | 138.572 | 2.444 | 68.464 | 0.056 768 | 32 | 89.599 | 0.14 | 159.353 | 4.639 | 94.478 | 1.129 768 | 48 | 94.312 | 0.33 | 177.518 | 1.728 | 99.704 | 0.131 768 | 64 | 94.243 | 0.218 | 182.263 | 2.634 | 90.027 | 0.19 768 | 80 | 95.566 | 0.068 | 185.748 | 32.128 | 96.61 | 0.157 768 | 96 | 99.435 | 0.323 | 195.603 | 13.653 | 102.222 | 0.027 768 | 112 | 105.814 | 0.366 | 216.653 | 1.694 | 103.918 | 0.497 1024 | 1 | 110.407 | 1.27 | 203.428 | 2.049 | 97.032 | 0.739 1024 | 32 | 137.626 | 1.62 | 221.029 | 22.25 | 141.785 | 1.301 1024 | 48 | 141.191 | 0.372 | 233.768 | 5.211 | 146.639 | 2.779 1024 | 64 | 141.227 | 0.238 | 255.31 | 35.069 | 139.287 | 0.376 1024 | 80 | 148.555 | 0.157 | 252.645 | 24.165 | 155.4 | 0.301 1024 | 96 | 155.47 | 0.321 | 272.952 | 3.799 | 162.416 | 3.969 1024 | 112 | 158.288 | 0.568 | 247.452 | 9.267 | 151.082 | 0.204 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2749890493 From dlong at openjdk.org Tue Mar 25 02:38:18 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Mar 2025 02:38:18 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: <8BMaz7Ssnwk6ywuPYfgaL1VSN7XZYBzpvMWRsG12-Eg=.441d03b6-4a0b-479b-b789-d9bfd03bd346@github.com> <0DkMl4UCWkNIUpFJKP2z4T-nP0NjJsrhbuczJLeWVHM=.64561b36-8472-4159-8d90-7cbec2b58d5d@github.com> Message-ID: On Mon, 24 Mar 2025 17:25:39 GMT, Evgeny Astigeevich wrote: >>> How the second change (oops) is addressed when we create a new nmethod >> >> Called from `nmethod()` constructor [nmethod::copy_values()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/nmethod.cpp#L1741) replaces handles used during compilation with oops. >> >> Note, `new_nmethod()` holds `CodeCache_lock` and `ciEnv::register_method()` holds `Compile_lock` (and `MethodCompileQueue_lock`). > > @vnkozlov @dean-long > It looks like the only way relocation can be performed correctly only at a safepoint. GC updating oops concurrently with relocation is an issue. > The safepoint requirement will limit use cases of relocation. Relocations should not be done often and should be done in a batch. On another side, relocating at a safepoint will simplify clone code patching. We need to fix offsets in call instructions at call sites. We don't need to clear them. > What do you think? If the only issue was GC updating oops concurrently, we could try using CompiledICLocker instead of forcing a safepoint. But now that I think about it, there are other issues, like the state of the entry barrier, and other GC epoch counters, and copying those consistently may require a safepoint and/or additional GC fixup logic. I think we need a GC expert to weigh in here. There may be other issues we are missing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2011192641 From dlong at openjdk.org Tue Mar 25 03:21:13 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Mar 2025 03:21:13 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:18:25 GMT, Afshin Zafari wrote: > The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift. Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. > > Tests: > linux-x64-debug tier1 on a UBSAN enabled build. I guess it's UB because the value can go from negative to positive if the sign bit is lost. The negative offset is coming from scaled_offset(). We set the value to negative and then flip it back later. It might be worth investigating why we do this. Is it just a clever hack so we get 1 more short value, -1..-32768 vs 1..32767? I remember looking at a similar issue before. D?j? vu? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2749953414 From dlong at openjdk.org Tue Mar 25 03:43:05 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Mar 2025 03:43:05 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:18:25 GMT, Afshin Zafari wrote: > The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift. Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. > > Tests: > linux-x64-debug tier1 on a UBSAN enabled build. It feels like we should be checking for overflow here, rather than hiding it with java_left_shift. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2749985995 From dlong at openjdk.org Tue Mar 25 03:51:06 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Mar 2025 03:51:06 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:18:25 GMT, Afshin Zafari wrote: > The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift. Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. > > Tests: > linux-x64-debug tier1 on a UBSAN enabled build. Would it be useful to instead use something new like left_shift_no_overflow()? It would assert if the operation is not reversible because of overflow, and I believe it could be implemented efficiently with val * (1<>1)+3)) > shift) failed: Invalid Shift value [v2] In-Reply-To: References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: On Mon, 24 Mar 2025 10:34:04 GMT, Emanuel Peter wrote: > I think this is ok as is. But I would like @jatin-bhateja to have a quick look as well :) Sure. Hi @jatin-bhateja , could you please help take a look at this change? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24051#issuecomment-2750134609 From xgong at openjdk.org Tue Mar 25 06:11:21 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 25 Mar 2025 06:11:21 GMT Subject: Integrated: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 01:18:57 GMT, Xiaohong Gong wrote: > The AArch64 vector rearrange implementation currently lacks support for vector types with lane counts < 4 (see [1]). This limitation results in significant performance gaps when running Long/Double vector benchmarks on NVIDIA Grace (SVE2 architecture with 128-bit vectors) compared to other SVE and x86 platforms. > > Vector rearrange operations depend on vector shuffle inputs, which used byte array as payload previously. The minimum vector lane count of 4 for byte type on AArch64 imposed this limitation on rearrange operations. However, vector shuffle payload has been updated to use vector-specific data types (e.g., `int` for `IntVector`) (see [2]). This change enables us to remove the lane count restriction for vector rearrange operations. > > This patch added the rearrange support for vector types with small lane count. Here are the main changes: > - Added AArch64 match rule support for `VectorRearrange` with smaller lane counts (e.g., `2D/2S`) > - Relocated NEON implementation from ad file to c2 macro assembler file for better handling of complex implementation > - Optimized temporary register usage in NEON implementation for short/int/float types from two registers to one > > Following is the performance improvement data of several Vector API JMH benchmarks, on a NVIDIA Grace CPU with NEON and SVE. Performance of the same JMH with other vector types remains unchanged. > > 1) NEON > > JMH on panama-vector:vectorIntrinsics: > > Benchmark (size) Mode Cnt Units Before After Gain > Double128Vector.rearrange 1024 thrpt 30 ops/ms 78.060 578.859 7.42x > Double128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.332 1811.664 25.05x > Double128Vector.unsliceUnary 1024 thrpt 30 ops/ms 72.256 1812.344 25.08x > Float64Vector.rearrange 1024 thrpt 30 ops/ms 77.879 558.797 7.18x > Float64Vector.sliceUnary 1024 thrpt 30 ops/ms 70.528 1981.304 28.09x > Float64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.735 1994.168 27.79x > Int64Vector.rearrange 1024 thrpt 30 ops/ms 76.374 562.106 7.36x > Int64Vector.sliceUnary 1024 thrpt 30 ops/ms 71.680 1190.127 16.60x > Int64Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.895 1185.094 16.48x > Long128Vector.rearrange 1024 thrpt 30 ops/ms 78.902 579.250 7.34x > Long128Vector.sliceUnary 1024 thrpt 30 ops/ms 72.389 747.794 10.33x > Long128Vector.unsliceUnary 1024 thrpt 30 ops/ms 71.999 747.848 10.38x > > > JMH on jdk mainline: > > Benchmark ... This pull request has now been integrated. Changeset: 99c8a6e4 Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/99c8a6e47ac9b0659349a849940c27c626beb905 Stats: 508 lines in 6 files changed: 401 ins; 86 del; 21 mod 8350463: AArch64: Add vector rearrange support for small lane count vectors Reviewed-by: epeter, bkilambi, haosun ------------- PR: https://git.openjdk.org/jdk/pull/23790 From xgong at openjdk.org Tue Mar 25 06:11:21 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 25 Mar 2025 06:11:21 GMT Subject: RFR: 8350463: AArch64: Add vector rearrange support for small lane count vectors In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 12:17:07 GMT, Emanuel Peter wrote: >>> @XiaohongGong Tests launched! Please ping me after the weekend for the results ? >> >> Hi @eme64 , thanks for your testing! May I ask about the test results please? > > Thanks @XiaohongGong For the work ? Thanks for your testing and review @eme64 ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23790#issuecomment-2750182232 From fyang at openjdk.org Tue Mar 25 06:44:11 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 25 Mar 2025 06:44:11 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: References: Message-ID: <9o_gKmGW4-GPVv-JeIZvmMZaiipBYhCunGFFj6EIXVM=.f05e0c00-241a-4e46-8471-6ad9987f3934@github.com> On Fri, 21 Mar 2025 12:53:01 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > We can let min/max to use cmove if Zicond is supported rather than a branch. > At this same time, this patch also simplify the code of min/max. > > Thanks! Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24153#pullrequestreview-2712552598 From fyang at openjdk.org Tue Mar 25 06:44:12 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 25 Mar 2025 06:44:12 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: <7Gpjkj3ilwHM-qjrinG_woBi-jwwjRVQS1zOjYNMVxQ=.e514a754-07c6-4a5f-bbe8-5cb23b28520f@github.com> References: <6Vx1NT54rOWXHIAj1Ug2Ike3A7bTmIYSOGzp09GA0pg=.1ccd1d10-fed8-4194-942d-a25d7c39b68b@github.com> <6shrH4rG_Ne1r7q99ptFITsnkZpo89ShLiXfOTkcfG0=.186f153b-7539-48d5-bb21-dce816f7f9f5@github.com> <7Gpjkj3ilwHM-qjrinG_woBi-jwwjRVQS1zOjYNMVxQ=.e514a754-07c6-4a5f-bbe8-5cb23b28520f@github.com> Message-ID: On Mon, 24 Mar 2025 16:19:24 GMT, Vixea wrote: >> I'm pretty sure the p550(hifive primier p550) doesn't support zicond but does support zbb, zba > > Umm anyway the p550(hifive primier and megrez don't support zicond but do support zbb/zba. > https://github.com/llvm/llvm-project/commit/5d03235c73476dfa3d2dd48c76de106fd1aa2ac7 > Oh heh I got the question backwards then in that case idk too. > But, anyway in fact you can just consider this as a code cleanup, in this sense seems it should be good? Seems fine to me in that respect. We can fix the cost in another PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24153#discussion_r2011417580 From chagedorn at openjdk.org Tue Mar 25 07:11:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Mar 2025 07:11:13 GMT Subject: RFR: 8352595: Regression of JDK-8314999 in IR matching [v3] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:35:36 GMT, Marc Chevalier wrote: >> A lot of tests for the IR framework used `ALLOC` and friends as a check that would run on the Opto assembly by default, but can also run earlier, but that's no longer the case. >> >> There were two kinds of tests to fix: the ones rather about `ALLOC`, where the used or expected compile phases have to change, and the tests where `ALLOC` were just a check that would run on opto assembly. For this, I tried to keep the spirit of the test using other regexes made for this stage. > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into fix/fix-IRframework-test > - wip > - wip > - Fix TestCompilePhaseCollector.java > - Fix TestPhaseIRMatching.java > - Fix TestBadFormat Thanks for the update, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24163#pullrequestreview-2712638337 From thartmann at openjdk.org Tue Mar 25 07:21:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 25 Mar 2025 07:21:10 GMT Subject: RFR: 8352595: Regression of JDK-8314999 in IR matching [v3] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:35:36 GMT, Marc Chevalier wrote: >> A lot of tests for the IR framework used `ALLOC` and friends as a check that would run on the Opto assembly by default, but can also run earlier, but that's no longer the case. >> >> There were two kinds of tests to fix: the ones rather about `ALLOC`, where the used or expected compile phases have to change, and the tests where `ALLOC` were just a check that would run on opto assembly. For this, I tried to keep the spirit of the test using other regexes made for this stage. > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into fix/fix-IRframework-test > - wip > - wip > - Fix TestCompilePhaseCollector.java > - Fix TestPhaseIRMatching.java > - Fix TestBadFormat Looks good to me too. Ship it! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24163#pullrequestreview-2712657320 From epeter at openjdk.org Tue Mar 25 07:23:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 07:23:13 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v42] In-Reply-To: References: <9V-aL5zZgNXWDXlHl3QB8brQGPhIKRmX7kXdlp2Z6lo=.cb88be68-f15d-4fd9-a4e0-be7952731e2f@github.com> Message-ID: On Mon, 24 Mar 2025 17:51:25 GMT, Johannes Graham wrote: >> I just hope that the abstraction here does not invalidate our intent to have constant min/max bounds for the clamping ;) > > It does the right thing with the constants - `testFoldableRange` would fail otherwise. Ah, you are right, great :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23089#discussion_r2011469850 From jbhateja at openjdk.org Tue Mar 25 07:25:10 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Mar 2025 07:25:10 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value [v2] In-Reply-To: References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: On Tue, 18 Mar 2025 03:51:55 GMT, Xiaohong Gong wrote: >> The following assertion fails on AArch64: >> >> >> Internal Error (jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2991), pid=3822987, tid=3823007 >> assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value >> >> >> with a simple Vector API case: >> >> public static IntVector test() { >> IntVector iv = IntVector.zero(IntVector.SPECIES_128); >> return iv.lanewise(VectorOperators.ROR, iv); >> } >> >> >> On AArch64, vector `ROR/ROL` (rotate right/left) operations are implemented with a combination of shifts. Please see the pattern for `ROR`: >> >> >> lsr dst1, src, cnt // unsigned right shift >> lsl dst2, src, bitSize - cnt // left shift >> orr dst, dst1, dst2 // logical or >> >> where `bitSize` is the element type width (e.g. `32` for `int`). In above case, `cnt` is a zero constant, resulting in a left shift of 32 (`bitSize - 0`), which exceeds the instruction's valid shift count range and triggers the assertion. To fix this, we need to mask the shift count to ensure it stays within valid range when calculating shift counts for rotate operations: `shiftCnt = shiftCnt & (bitSize - 1)`. >> >> Note that the mask is only necessary for constant shift counts. This not only fixes the assertion failure, but also allows `ROR/ROL src, 0` to be optimized to `src` directly. >> >> For vector variables as shift counts, the masking can be safely omitted because: >> 1. Vector shift instructions that take a vector register as the shift count may not automatically apply modulo arithmetic based on register size. When the shift count is `32` for int type, the result may be either `zeros` or `src`. However, this doesn't affect correctness for rotate since the final result is combined with `src` using a logical `OR` operation. >> 2. It saves a vector logical `AND` for masking, which is friendly to the performance. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update the test case I don't have access to Grace :-) but I have verified your fix over Graviton3. LGTM. src/hotspot/share/opto/vectornode.cpp line 1687: > 1685: shiftRCnt = phase->transform(new AndINode(cnt, phase->intcon(shift_mask))); > 1686: shiftLCnt = phase->transform(new SubINode(phase->intcon(shift_mask + 1), shiftRCnt)); > 1687: shiftLCnt = phase->transform(new AndINode(shiftLCnt, phase->intcon(shift_mask))); FTR, Java-side implementation ensures rounding off the shift count https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L969 This is the edge case which does not impact x86 as instruction semantics are clear about setting the register to zero if [shift count is greater than respective lane count](https://www.felixcloutier.com/x86/psllw:pslld:psllq#:~:text=If%20the%20value%20specified%20by%20the%20count%20operand%20is%20greater%20than%2015%20(for%20words)%2C%2031%20(for%20doublewords)%2C%20or%2063%20(for%20a%20quadword)%2C%20then%20the%20destination%20operand%20is%20set%20to%20all%200s.%20Figure%204%2D17%20gives%20an%20example%20of%20shifting%20words%20in%20a%2064%2Dbit%20operand.) Thus right shifted vector by 0 shift count when ORed with all zero left-shited vector by 32 shift count still gives the correct result. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24051#pullrequestreview-2712495498 PR Review Comment: https://git.openjdk.org/jdk/pull/24051#discussion_r2011382909 From epeter at openjdk.org Tue Mar 25 07:30:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 07:30:15 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v44] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 18:29:34 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > Undo accidental changes to Int tests Tests from yesterday passed. We only had minor changes, so I'll trust the GitHub Actions for testing. @j3graham Thanks for your work on this, and all the updates during the review ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23089#pullrequestreview-2712676818 From duke at openjdk.org Tue Mar 25 07:31:19 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 25 Mar 2025 07:31:19 GMT Subject: RFR: 8352595: Regression of JDK-8314999 in IR matching [v3] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:35:36 GMT, Marc Chevalier wrote: >> A lot of tests for the IR framework used `ALLOC` and friends as a check that would run on the Opto assembly by default, but can also run earlier, but that's no longer the case. >> >> There were two kinds of tests to fix: the ones rather about `ALLOC`, where the used or expected compile phases have to change, and the tests where `ALLOC` were just a check that would run on opto assembly. For this, I tried to keep the spirit of the test using other regexes made for this stage. > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into fix/fix-IRframework-test > - wip > - wip > - Fix TestCompilePhaseCollector.java > - Fix TestPhaseIRMatching.java > - Fix TestBadFormat Let's :shipit: indeed! Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24163#issuecomment-2750331437 From duke at openjdk.org Tue Mar 25 07:31:20 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 25 Mar 2025 07:31:20 GMT Subject: Integrated: 8352595: Regression of JDK-8314999 in IR matching In-Reply-To: References: Message-ID: <9UpLvvqt2Syg5IXF7FQI5vjshoTjY1h6wVKc3DK8-SI=.a6ab0aa7-ab7d-4f20-af63-69892c1fb7da@github.com> On Fri, 21 Mar 2025 15:41:23 GMT, Marc Chevalier wrote: > A lot of tests for the IR framework used `ALLOC` and friends as a check that would run on the Opto assembly by default, but can also run earlier, but that's no longer the case. > > There were two kinds of tests to fix: the ones rather about `ALLOC`, where the used or expected compile phases have to change, and the tests where `ALLOC` were just a check that would run on opto assembly. For this, I tried to keep the spirit of the test using other regexes made for this stage. This pull request has now been integrated. Changeset: c94bc742 Author: Marc Chevalier Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/c94bc7427ce86dce9613d3a201eef7f3828447b0 Stats: 89 lines in 3 files changed: 20 ins; 2 del; 67 mod 8352595: Regression of JDK-8314999 in IR matching Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24163 From xgong at openjdk.org Tue Mar 25 07:32:14 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 25 Mar 2025 07:32:14 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value [v2] In-Reply-To: References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: <30ATU5oS61q06CLQaQWzB26nvgPCBGBOk6wcEeTvZ4U=.3dd66ce5-6971-4245-a483-22793f657f2c@github.com> On Tue, 25 Mar 2025 07:22:47 GMT, Jatin Bhateja wrote: > I don't have access to Grace :-) but I have verified your fix over Graviton3. > > LGTM. Thanks for your testing and review! > FTR, Java-side implementation ensures rounding off the shift count https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L969 Thanks for your review @jatin-bhateja ! Yes, the Java-side would make the masking before it goes to hotspot. This issue happens because the initial shift count in java-size is `0`, and `32` is generated by `shift_mask + 1 - 0`. So an additional mask is needed to the new generated shift count. > This is the edge case which does not impact x86 as instruction semantics are clear about setting the register to zero if [shift count is greater than respective lane count](https://www.felixcloutier.com/x86/psllw:pslld:psllq#:~:text=If%20the%20value%20specified%20by%20the%20count%20operand%20is%20greater%20than%2015%20(for%20words)%2C%2031%20(for%20doublewords)%2C%20or%2063%20(for%20a%20quadword)%2C%20then%20the%20destination%20operand%20is%20set%20to%20all%200s.%20Figure%204%2D17%20gives%20an%20example%20of%20shifting%20words%20in%20a%2064%2Dbit%20operand.) > > Thus right shifted vector by 0 shift count when ORed with all zero left-shited vector by 32 shift count still gives the correct result. That would be good. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24051#issuecomment-2750337201 PR Review Comment: https://git.openjdk.org/jdk/pull/24051#discussion_r2011478714 From epeter at openjdk.org Tue Mar 25 07:33:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 07:33:26 GMT Subject: RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 20:34:30 GMT, Christian Hagedorn wrote: >> This was a fuzzer failure, which hit an assert in SuperWord: >> >> `# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop` >> >> We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible. >> >> I'm proposing two changes in this PR: >> - We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop. >> - In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`. >> >> I added 2 tests: >> - The fuzzer generated test that hits the assert before this patch. >> - An IR test that checks that we do not multiversion in a `PeelMainPost` loop case. >> >> --------------- >> >> **FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example: >> >> `test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java` >> >> With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` >> >> >> Counted Loop: N537/N176 counted [int,100),+1 (-1 iters) >> Loop: N0/N0 has_sfpt >> Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 } >> Loop: N536/N535 >> Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined >> Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt >> Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt >> Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined >> PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined >> Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined >> Poor node estimate: 306 >> 92 >> Loop: N0/N0 has_sfpt >> Loop: N307/N361 limit_check profile_predicated predicated sfpts={... > > src/hotspot/share/opto/loopTransform.cpp line 3435: > >> 3433: if (!peel_only) { >> 3434: // We are going to add pre-loop and post-loop (PreMainPost). >> 3435: // But should we also multi-version for auto-vectorization speculative > > Just a side note: It seems that we sometimes say "multiversion" and sometimes "multi-version". I find the latter more readable. Maybe we can make it consistent at some point in a separate task. This could also mean that we have `*_multi_version_*` in method names instead of `*_multiversion*_`. Hmm... good catch. Personally, I prefer "multiversion" as a single word. But this is very subjective ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24183#discussion_r2011481992 From epeter at openjdk.org Tue Mar 25 07:43:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 07:43:58 GMT Subject: RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops [v2] In-Reply-To: References: Message-ID: <4LPkDqyYa05a3yQbSm0oyfnlFi9cNoerTKMN8GiAjKo=.cf03b610-3402-4bd7-87bd-d4d4df7d30e7@github.com> On Tue, 25 Mar 2025 07:30:42 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopTransform.cpp line 3435: >> >>> 3433: if (!peel_only) { >>> 3434: // We are going to add pre-loop and post-loop (PreMainPost). >>> 3435: // But should we also multi-version for auto-vectorization speculative >> >> Just a side note: It seems that we sometimes say "multiversion" and sometimes "multi-version". I find the latter more readable. Maybe we can make it consistent at some point in a separate task. This could also mean that we have `*_multi_version_*` in method names instead of `*_multiversion*_`. > > Hmm... good catch. Personally, I prefer "multiversion" as a single word. But this is very subjective ? I now renamed the 2 occurances of `multi-version` to `multiversion`, since that was a minimal diff. We could still file an RFE to rename everything later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24183#discussion_r2011493707 From epeter at openjdk.org Tue Mar 25 07:43:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 07:43:58 GMT Subject: RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops [v2] In-Reply-To: References: Message-ID: > This was a fuzzer failure, which hit an assert in SuperWord: > > `# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop` > > We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible. > > I'm proposing two changes in this PR: > - We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop. > - In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`. > > I added 2 tests: > - The fuzzer generated test that hits the assert before this patch. > - An IR test that checks that we do not multiversion in a `PeelMainPost` loop case. > > --------------- > > **FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example: > > `test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java` > > With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` > > > Counted Loop: N537/N176 counted [int,100),+1 (-1 iters) > Loop: N0/N0 has_sfpt > Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 } > Loop: N536/N535 > Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined > Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt > Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt > Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined > PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined > Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined > Poor node estimate: 306 >> 92 > Loop: N0/N0 has_sfpt > Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 } > Loop: N556/N557 sfpts={ 559 } > Loop: N552/N554 counted... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - rename - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24183/files - new: https://git.openjdk.org/jdk/pull/24183/files/1ab4f731..c21d2ced Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24183&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24183&range=00-01 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24183/head:pull/24183 PR: https://git.openjdk.org/jdk/pull/24183 From epeter at openjdk.org Tue Mar 25 07:50:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 07:50:12 GMT Subject: RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:50:30 GMT, Vladimir Kozlov wrote: >> This was a fuzzer failure, which hit an assert in SuperWord: >> >> `# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop` >> >> We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible. >> >> I'm proposing two changes in this PR: >> - We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop. >> - In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`. >> >> I added 2 tests: >> - The fuzzer generated test that hits the assert before this patch. >> - An IR test that checks that we do not multiversion in a `PeelMainPost` loop case. >> >> --------------- >> >> **FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example: >> >> `test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java` >> >> With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` >> >> >> Counted Loop: N537/N176 counted [int,100),+1 (-1 iters) >> Loop: N0/N0 has_sfpt >> Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 } >> Loop: N536/N535 >> Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined >> Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt >> Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt >> Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined >> PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined >> Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined >> Poor node estimate: 306 >> 92 >> Loop: N0/N0 has_sfpt >> Loop: N307/N361 limit_check profile_predicated predicated sfpts={... > >> If reviewers thing this really should be investigated, I could file a follow-up RFE. > > Yes, please. Can you check without "strip minning" if we can eliminate this loop? @vnkozlov Thanks for your review! @chhagedorn I addressed all your comments :) I filed: [JDK-8352819](https://bugs.openjdk.org/browse/JDK-8352819) C2 SuperWord: add assert to eliminate_useless_multiversion_if ------------- PR Comment: https://git.openjdk.org/jdk/pull/24183#issuecomment-2750371887 From shade at openjdk.org Tue Mar 25 08:01:20 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Mar 2025 08:01:20 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v5] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 19:23:56 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. >> >> This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. >> >> As a result, EA complains about a strange memory graph. >> >> Testing: jdk_jfr >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > simplified test Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24192#pullrequestreview-2712743344 From epeter at openjdk.org Tue Mar 25 08:01:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 08:01:21 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v6] In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 17:55:27 GMT, Jatin Bhateja wrote: >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 >> >> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). >> >> Summary of changes included with the patch: >> 1. C2 compiler New Vector IR creation. >> 2. Auto-vectorization support. >> 3. x86 backend implementation. >> 4. New IR verification test for each newly supported vector operation. >> >> Following are the performance numbers of Float16OperationsBenchmark >> >> System : Intel(R) Xeon(R) Processor code-named Granite rapids >> Frequency fixed at 2.5 GHz >> >> >> Baseline >> Benchmark (vectorDim) Mode Cnt Score Error Units >> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms >> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms >> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms >> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms >> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms >> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms >> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms >> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms >> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms >> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms >> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms >> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms >> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms >> Float16OperationsBenchmark.isInfiniteCMovBen... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Removing Generator dependency on incubation module I looked at the changes in `Generators.java`, thanks for adding some code there ? Some comments on it: - You should add some Float16 tests to `test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java`. - I am missing the "mixed distribution" function `float16s()`. As a reference, take `public Generator doubles()`. The idea is that we have a set of distributions, and we pick a random distribution every time in the tests. - I'm also missing a "any bits" version, where you would take a random short value and reinterpret it as `Float16`. This ensures that we are getting all possible encodings, including multiple NaN encodings. - All of this is probably enough code to make a separate PR. test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 74: > 72: short min_value = float16ToRawShortBits(Float16.MIN_VALUE); > 73: short max_value = float16ToRawShortBits(Float16.MAX_VALUE); > 74: Generator gen = G.mixedWithSpecialFloat16s(G.uniformFloat16s(min_value, max_value), 10, 2); Here you would simply be using the `float16s` random distribution picker. Sometimes you would get uniform, sometimes special, sometimes mixed, sometimes any-bits, etc. ------------- PR Review: https://git.openjdk.org/jdk/pull/22755#pullrequestreview-2712740608 PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2011516136 From chagedorn at openjdk.org Tue Mar 25 08:02:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Mar 2025 08:02:34 GMT Subject: RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops [v2] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 07:43:58 GMT, Emanuel Peter wrote: >> This was a fuzzer failure, which hit an assert in SuperWord: >> >> `# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop` >> >> We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible. >> >> I'm proposing two changes in this PR: >> - We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop. >> - In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`. >> >> I added 2 tests: >> - The fuzzer generated test that hits the assert before this patch. >> - An IR test that checks that we do not multiversion in a `PeelMainPost` loop case. >> >> --------------- >> >> **FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example: >> >> `test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java` >> >> With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` >> >> >> Counted Loop: N537/N176 counted [int,100),+1 (-1 iters) >> Loop: N0/N0 has_sfpt >> Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 } >> Loop: N536/N535 >> Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined >> Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt >> Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt >> Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined >> PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined >> Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined >> Poor node estimate: 306 >> 92 >> Loop: N0/N0 has_sfpt >> Loop: N307/N361 limit_check profile_predicated predicated sfpts={... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - rename > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Update looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24183#pullrequestreview-2712745145 From chagedorn at openjdk.org Tue Mar 25 08:02:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Mar 2025 08:02:34 GMT Subject: RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops [v2] In-Reply-To: <4LPkDqyYa05a3yQbSm0oyfnlFi9cNoerTKMN8GiAjKo=.cf03b610-3402-4bd7-87bd-d4d4df7d30e7@github.com> References: <4LPkDqyYa05a3yQbSm0oyfnlFi9cNoerTKMN8GiAjKo=.cf03b610-3402-4bd7-87bd-d4d4df7d30e7@github.com> Message-ID: <-NOGbwWh3JyQcsQuj-m3wJlMccM4f5gPtRh4O0Gm8sg=.5fcdda4a-91bc-4d1a-8c5f-f1fc085fcc3c@github.com> On Tue, 25 Mar 2025 07:40:30 GMT, Emanuel Peter wrote: >> Hmm... good catch. Personally, I prefer "multiversion" as a single word. But this is very subjective ? > > I now renamed the 2 occurances of `multi-version` to `multiversion`, since that was a minimal diff. We could still file an RFE to rename everything later. Your call on what you like better since you introduced the concept :-) It should just be consistent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24183#discussion_r2011516723 From duke at openjdk.org Tue Mar 25 08:03:21 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 25 Mar 2025 08:03:21 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v17] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:09:02 GMT, Marc Chevalier wrote: >> This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. >> >> Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. >> >> This also works for multiplications by powers of 2 since they are already translated into shifts. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 31 additional commits since the last revision: > > - Merge branch 'master' into fix/missing-transformation-for-chain-of-shifts-multiplications-by-constants > - Rephrase comment > - more checks > - order > - rephrase > - correct > - s > - rephrased corner case > - rephrase > - char -> byte > - ... and 21 more: https://git.openjdk.org/jdk/compare/30067c8b...cd0b0c09 I've merged master in it, ran more tests. Seems all good. Thanks for the many reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23728#issuecomment-2750392380 From duke at openjdk.org Tue Mar 25 08:03:22 2025 From: duke at openjdk.org (duke) Date: Tue, 25 Mar 2025 08:03:22 GMT Subject: RFR: 8347459: C2: missing transformation for chain of shifts/multiplications by constants [v17] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:09:02 GMT, Marc Chevalier wrote: >> This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. >> >> Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. >> >> This also works for multiplications by powers of 2 since they are already translated into shifts. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 31 additional commits since the last revision: > > - Merge branch 'master' into fix/missing-transformation-for-chain-of-shifts-multiplications-by-constants > - Rephrase comment > - more checks > - order > - rephrase > - correct > - s > - rephrased corner case > - rephrase > - char -> byte > - ... and 21 more: https://git.openjdk.org/jdk/compare/30067c8b...cd0b0c09 @marc-chevalier Your change (at version cd0b0c093fa2fe80d287cc0b5bd703ee2a4ad5b1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23728#issuecomment-2750395127 From epeter at openjdk.org Tue Mar 25 08:27:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 08:27:13 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: <_H5m7aIZW8wLR4GuXBd7XGOIFMdv3VHqw9Ef7xBQDQM=.27a0ebd3-4473-4bd1-aee5-7f9dc2dde878@github.com> On Wed, 19 Mar 2025 14:36:29 GMT, Christian Hagedorn wrote: >> The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). >> >> Note that most of the line changes are from new tests. >> >> ### The Problem >> There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. >> >> #### An Inner Loop without Parse Predicates >> The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) >> >> We only have Parse Predicates for the outer loop. Why? >> >> Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) >> >> In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. >> >> For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. >> >> #### Predicates from a Folded Loop End up at Another Loop >> As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Small things > - Fix test comments > - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. > - Merge branch 'master' into JDK-8350579 > - Revert fix completely > - 8350579: Remove Template Assertion Predicates belonging to a > loop once it is folded away during IGVN Still approved. My comments were just suggestions / control questions. Thanks for answering ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23823#pullrequestreview-2712813189 From epeter at openjdk.org Tue Mar 25 08:27:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 08:27:14 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Mon, 24 Mar 2025 15:31:03 GMT, Christian Hagedorn wrote: >> If not, could we assert something similar? > > I thought about somehow asserting here that as well. But the problem is that at this point, we already concatenated the original and the new loop together to represent one round of unrolling. So, we do not find the original loop exit check anymore from which we could have read the stride. That's why I explicitly take the cached `stride_con_before_unroll` and double it here. > > We could have maybe cached the original loop exit node somehow to query it. But I don't think it adds much value since it's as good the original stride which was read from the loop exit node. Makes sense. The assert is not that important here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2011555211 From epeter at openjdk.org Tue Mar 25 08:27:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 08:27:15 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: <3vNSau6-nBLDkGGPoq2ijj4lSKAlB4KQfW-Ys3heuTA=.98670082-0972-4921-8991-a9dfa39cfa14@github.com> References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> <3vNSau6-nBLDkGGPoq2ijj4lSKAlB4KQfW-Ys3heuTA=.98670082-0972-4921-8991-a9dfa39cfa14@github.com> Message-ID: On Mon, 24 Mar 2025 15:19:54 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/predicates.cpp line 1267: >> >>> 1265: template_assertion_predicate.opaque_node()->mark_useful(); >>> 1266: } >>> 1267: } >> >> I'm not sure if it makes to split this into two methods, but that's subjective ? >> >> It seems to me that the code in `visit` is an optimization for what happens in `mark_template_useful_if_matching_loop`, and does not really make sense on its own. > > The reasons I've split it is the following: > - The bailout for non-counted loops is actually separate to the marking. So I have a two-step algorithm: bailout + marking which can nicely be split. > - Having `mark_template_useful_if_matching_loop()` allows me to quickly read `visit()` and understand what's going on. Additionally, I can put the details about why we do the marking at the method comment for more interested code readers. Without the extracted method, I would probably need to put an extra "mark template useful if matching loop" comment + the 6 lines of comments at `mark_template_useful_if_matching_loop()` into the `visit()` method which makes it harder to grasp. > > I would prefer to stick to what I have now - but I admit it's a subjective matter :-) I leave it up to you :) But could `opaque_node->loop_node() == _loop_node` even be true if we have `!_loop_node->is_CountedLoop()`? or do we actually need a CountedLoop to even have the match? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2011554694 From thartmann at openjdk.org Tue Mar 25 08:28:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 25 Mar 2025 08:28:10 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v5] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 19:23:56 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. >> >> This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. >> >> As a result, EA complains about a strange memory graph. >> >> Testing: jdk_jfr >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > simplified test Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24192#pullrequestreview-2712815298 From thartmann at openjdk.org Tue Mar 25 08:28:11 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 25 Mar 2025 08:28:11 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v5] In-Reply-To: <5ASdY5jtvoX2wp-xEWbHSe_fLBiRuWklQra9T0MYG4U=.55746d71-6689-4212-8443-6a2cace9a8f1@github.com> References: <5ASdY5jtvoX2wp-xEWbHSe_fLBiRuWklQra9T0MYG4U=.55746d71-6689-4212-8443-6a2cace9a8f1@github.com> Message-ID: On Mon, 24 Mar 2025 12:23:18 GMT, Markus Gr?nlund wrote: >> What would it give? > > Disabling background compilation makes it more deterministic? Yes and you might need fewer runs to trigger compilation and wait for it to finish. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24192#discussion_r2011557304 From thartmann at openjdk.org Tue Mar 25 08:30:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 25 Mar 2025 08:30:10 GMT Subject: RFR: 8352490: Fatal error message for unhandled bytecode needs more detail [v2] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:19:37 GMT, Saranya Natarajan wrote: >> Description: Improve the error message for unhandled bytecode in `line#129` of function `Bytecodes::Code ciBytecodeStream::next_wide_or_table` in file ciStream.cpp >> >> Solution: The error message is improved to print OPCODE and bytecode index (BCI) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > adding information for printing current method Looks good to me! I assume that you tested the error reporting by temporarily enabling it always. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24187#pullrequestreview-2712819252 From xgong at openjdk.org Tue Mar 25 08:30:20 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 25 Mar 2025 08:30:20 GMT Subject: Integrated: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value In-Reply-To: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: On Fri, 14 Mar 2025 09:43:15 GMT, Xiaohong Gong wrote: > The following assertion fails on AArch64: > > > Internal Error (jdk/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2991), pid=3822987, tid=3823007 > assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value > > > with a simple Vector API case: > > public static IntVector test() { > IntVector iv = IntVector.zero(IntVector.SPECIES_128); > return iv.lanewise(VectorOperators.ROR, iv); > } > > > On AArch64, vector `ROR/ROL` (rotate right/left) operations are implemented with a combination of shifts. Please see the pattern for `ROR`: > > > lsr dst1, src, cnt // unsigned right shift > lsl dst2, src, bitSize - cnt // left shift > orr dst, dst1, dst2 // logical or > > where `bitSize` is the element type width (e.g. `32` for `int`). In above case, `cnt` is a zero constant, resulting in a left shift of 32 (`bitSize - 0`), which exceeds the instruction's valid shift count range and triggers the assertion. To fix this, we need to mask the shift count to ensure it stays within valid range when calculating shift counts for rotate operations: `shiftCnt = shiftCnt & (bitSize - 1)`. > > Note that the mask is only necessary for constant shift counts. This not only fixes the assertion failure, but also allows `ROR/ROL src, 0` to be optimized to `src` directly. > > For vector variables as shift counts, the masking can be safely omitted because: > 1. Vector shift instructions that take a vector register as the shift count may not automatically apply modulo arithmetic based on register size. When the shift count is `32` for int type, the result may be either `zeros` or `src`. However, this doesn't affect correctness for rotate since the final result is combined with `src` using a logical `OR` operation. > 2. It saves a vector logical `AND` for masking, which is friendly to the performance. This pull request has now been integrated. Changeset: f9bcef4d Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/f9bcef4dba569701ebed7762fc8730d552325382 Stats: 304 lines in 2 files changed: 303 ins; 0 del; 1 mod 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value Reviewed-by: chagedorn, epeter, jbhateja, adinn ------------- PR: https://git.openjdk.org/jdk/pull/24051 From xgong at openjdk.org Tue Mar 25 08:30:19 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 25 Mar 2025 08:30:19 GMT Subject: RFR: 8351627: C2 AArch64 ROR/ROL: assert((1 << ((T>>1)+3)) > shift) failed: Invalid Shift value [v2] In-Reply-To: References: <3_xp0Rv26RRZlZvqCbj69UtZI3ywkgVUrlBTJZI_Ayo=.f4b4364e-feaa-4542-be35-1a09422c549c@github.com> Message-ID: On Wed, 19 Mar 2025 20:04:56 GMT, Christian Hagedorn wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Update the test case > > Looks good to me, too. Let's wait until the testing results from Emanuel are back. Thanks for the review @chhagedorn , @eme64 , @adinn and @jatin-bhateja ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24051#issuecomment-2750462088 From jbhateja at openjdk.org Tue Mar 25 08:34:27 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Mar 2025 08:34:27 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v3] In-Reply-To: References: Message-ID: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24169/files - new: https://git.openjdk.org/jdk/pull/24169/files/d1fd0d84..fe793a53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=01-02 Stats: 89 lines in 3 files changed: 1 ins; 40 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/24169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24169/head:pull/24169 PR: https://git.openjdk.org/jdk/pull/24169 From jbhateja at openjdk.org Tue Mar 25 08:34:27 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Mar 2025 08:34:27 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v2] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 00:16:14 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor cleanup > > src/hotspot/cpu/x86/assembler_x86.cpp line 13758: > >> 13756: attributes.set_is_evex_instruction(); >> 13757: attributes.set_embedded_opmask_register_specifier(mask); >> 13758: attributes.reset_is_clear_context(); > > Why do we do reset_is_clear_context here? We want kdst bits to be set/reset and no merge context. Actually, its not relevant in this case. EVEX.Z bit is used to select b/w merging and zeroing semantics w.r.t to vector destination. for opmask destination we always set the [bits corresponding to masked out lanes to zero](https://www.felixcloutier.com/x86/vcmpph#:~:text=CMP_OPERATOR%20tsrc2%0A%20%20%20%20ELSE-,DEST.bit%5Bj%5D%20%3A%3D%200,-DEST%5BMAXKL%2D1) > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7093: > >> 7091: } >> 7092: >> 7093: void C2_MacroAssembler::scalar_max_min_fp16(int opcode, XMMRegister dst, XMMRegister src1, XMMRegister src2, > > Any reason we are not doing this on lines of scalar emit_fp_min_max? For most common cases emit_fp_min_max based sequence would have much better latency. We don't need any blend emulation on CPUs supporting AVX512-FP16, it's specific to E-core targets. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2011566840 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2011566750 From mbaesken at openjdk.org Tue Mar 25 08:36:12 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 25 Mar 2025 08:36:12 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: <_24qhztVuCyVHLQZYiS0BXsxBQj9wc0YHBkrZQBCdj4=.5444e185-3c80-43eb-a81b-c0f50bfd1a2f@github.com> On Mon, 10 Mar 2025 13:37:23 GMT, Matthias Baesken wrote: > When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test > java/foreign/TestHandshake > this error/warning is reported : > > jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' > UndefinedBehaviorSanitizer:DEADLYSIGNAL > UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. > > Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); Should we maybe do the frequency calculation adjustment in a separate follow up? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23962#issuecomment-2750479688 From rehn at openjdk.org Tue Mar 25 08:39:17 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 25 Mar 2025 08:39:17 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 12:53:01 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > We can let min/max to use cmove if Zicond is supported rather than a branch. > At this same time, this patch also simplify the code of min/max. > > Thanks! Marked as reviewed by rehn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24153#pullrequestreview-2712848348 From chagedorn at openjdk.org Tue Mar 25 08:50:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Mar 2025 08:50:08 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v3] In-Reply-To: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: > The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). > > Note that most of the line changes are from new tests. > > ### The Problem > There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. > > #### An Inner Loop without Parse Predicates > The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: > > ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) > > We only have Parse Predicates for the outer loop. Why? > > Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: > > ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) > > In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. > > For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. > > #### Predicates from a Folded Loop End up at Another Loop > As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is only running for a single iteration and the bac... Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge methods - Merge branch 'master' into JDK-8350579 - Small things - Fix test comments - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. - Merge branch 'master' into JDK-8350579 - Revert fix completely - 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away during IGVN ------------- Changes: https://git.openjdk.org/jdk/pull/23823/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23823&range=02 Stats: 700 lines in 9 files changed: 572 ins; 44 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/23823.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23823/head:pull/23823 PR: https://git.openjdk.org/jdk/pull/23823 From epeter at openjdk.org Tue Mar 25 08:50:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 08:50:09 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v3] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Tue, 25 Mar 2025 08:46:09 GMT, Christian Hagedorn wrote: >> The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). >> >> Note that most of the line changes are from new tests. >> >> ### The Problem >> There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. >> >> #### An Inner Loop without Parse Predicates >> The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) >> >> We only have Parse Predicates for the outer loop. Why? >> >> Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) >> >> In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. >> >> For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. >> >> #### Predicates from a Folded Loop End up at Another Loop >> As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge methods > - Merge branch 'master' into JDK-8350579 > - Small things > - Fix test comments > - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. > - Merge branch 'master' into JDK-8350579 > - Revert fix completely > - 8350579: Remove Template Assertion Predicates belonging to a > loop once it is folded away during IGVN Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23823#pullrequestreview-2712880758 From chagedorn at openjdk.org Tue Mar 25 08:50:10 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Mar 2025 08:50:10 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Wed, 19 Mar 2025 14:36:29 GMT, Christian Hagedorn wrote: >> The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). >> >> Note that most of the line changes are from new tests. >> >> ### The Problem >> There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. >> >> #### An Inner Loop without Parse Predicates >> The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) >> >> We only have Parse Predicates for the outer loop. Why? >> >> Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) >> >> In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. >> >> For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. >> >> #### Predicates from a Folded Loop End up at Another Loop >> As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Small things > - Fix test comments > - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. > - Merge branch 'master' into JDK-8350579 > - Revert fix completely > - 8350579: Remove Template Assertion Predicates belonging to a > loop once it is folded away during IGVN Thanks Emanuel for your review! I will submit some more testing with the latest changes before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23823#issuecomment-2750514725 From chagedorn at openjdk.org Tue Mar 25 08:50:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Mar 2025 08:50:11 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v2] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> <3vNSau6-nBLDkGGPoq2ijj4lSKAlB4KQfW-Ys3heuTA=.98670082-0972-4921-8991-a9dfa39cfa14@github.com> Message-ID: On Tue, 25 Mar 2025 08:23:31 GMT, Emanuel Peter wrote: >> The reasons I've split it is the following: >> - The bailout for non-counted loops is actually separate to the marking. So I have a two-step algorithm: bailout + marking which can nicely be split. >> - Having `mark_template_useful_if_matching_loop()` allows me to quickly read `visit()` and understand what's going on. Additionally, I can put the details about why we do the marking at the method comment for more interested code readers. Without the extracted method, I would probably need to put an extra "mark template useful if matching loop" comment + the 6 lines of comments at `mark_template_useful_if_matching_loop()` into the `visit()` method which makes it harder to grasp. >> >> I would prefer to stick to what I have now - but I admit it's a subjective matter :-) > > I leave it up to you :) > > But could `opaque_node->loop_node() == _loop_node` even be true if we have `!_loop_node->is_CountedLoop()`? or do we actually need a CountedLoop to even have the match? No, it cannot be true. `opaque_node->loop_node()` is a `CountedLoop`. We could have only `opaque_node->loop_node() == _loop_node`. I think I first had that as an assertion in place but then turned it into a bailout. But you're right, it would already be covered by the check. Given it's a rare edge case, I guess we can get rid of it. Then the reason above does not apply anymore that we have 2 steps but only 1. Then it does not make sense to split it further. I merged it back together. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23823#discussion_r2011591603 From chagedorn at openjdk.org Tue Mar 25 09:05:19 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Mar 2025 09:05:19 GMT Subject: RFR: 8352490: Fatal error message for unhandled bytecode needs more detail [v2] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:19:37 GMT, Saranya Natarajan wrote: >> Description: Improve the error message for unhandled bytecode in `line#129` of function `Bytecodes::Code ciBytecodeStream::next_wide_or_table` in file ciStream.cpp >> >> Solution: The error message is improved to print OPCODE and bytecode index (BCI) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > adding information for printing current method Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24187#pullrequestreview-2712939880 From mli at openjdk.org Tue Mar 25 09:32:21 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 25 Mar 2025 09:32:21 GMT Subject: RFR: 8320997: RISC-V: C2 ReverseV In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 11:21:33 GMT, Ludovic Henry wrote: >> Hi, >> >> Can you help to review this patch to implement ReverseV? >> >> Thanks! > > Marked as reviewed by luhenry (Committer). Thank you @luhenry ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24096#issuecomment-2750629121 From mli at openjdk.org Tue Mar 25 09:32:22 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 25 Mar 2025 09:32:22 GMT Subject: Integrated: 8320997: RISC-V: C2 ReverseV In-Reply-To: References: Message-ID: <9tWbGQsv5XABWE9V4rT3FiFZTnhJ63D1XGRDt_95Qs0=.05581366-1640-4b27-802d-5bee5bf16d5c@github.com> On Tue, 18 Mar 2025 10:42:21 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this patch to implement ReverseV? > > Thanks! This pull request has now been integrated. Changeset: 9f582e56 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/9f582e56baee0e7f5af20da0f395cd935bf5a962 Stats: 30 lines in 2 files changed: 29 ins; 0 del; 1 mod 8320997: RISC-V: C2 ReverseV Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/24096 From rcastanedalo at openjdk.org Tue Mar 25 09:34:19 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 25 Mar 2025 09:34:19 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v10] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 15:35:33 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/regmask.hpp line 188: >> >>> 186: // all registers/stack locations under _lwm and over _hwm are excluded. >>> 187: // The exception is (s10, s11, ...), where the value is decided solely by >>> 188: // _all_stack, regardless of the value of _hwm. >> >> This comment illustrates the case with `_offset = 0`, I think it would be useful to extend it with an example where `_offset > 0`. Here is a suggestion: https://github.com/openjdk/jdk/commit/8377012ac485a70703921822d58bc535bafb7a0a. Feel free to merge as-is or edit to your liking, if you agree. > > Looks good to me, now merged. There are likely other opportunities for more source code comment illustrations throughout `regmask.hpp`. `SUBTRACT_inner` and `overlap` comes to mind, in particular. I'll have a look and see what can be improved. Thanks, Daniel! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2011688122 From mli at openjdk.org Tue Mar 25 09:34:23 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 25 Mar 2025 09:34:23 GMT Subject: Integrated: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 12:53:01 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > We can let min/max to use cmove if Zicond is supported rather than a branch. > At this same time, this patch also simplify the code of min/max. > > Thanks! This pull request has now been integrated. Changeset: 3d3b7820 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/3d3b7820371058b40f2e694536c98aa3900abb5f Stats: 66 lines in 1 file changed: 1 ins; 48 del; 17 mod 8352607: RISC-V: use cmove in min/max when Zicond is supported Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/24153 From mli at openjdk.org Tue Mar 25 09:34:23 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 25 Mar 2025 09:34:23 GMT Subject: RFR: 8352607: RISC-V: use cmove in min/max when Zicond is supported In-Reply-To: <9o_gKmGW4-GPVv-JeIZvmMZaiipBYhCunGFFj6EIXVM=.f05e0c00-241a-4e46-8471-6ad9987f3934@github.com> References: <9o_gKmGW4-GPVv-JeIZvmMZaiipBYhCunGFFj6EIXVM=.f05e0c00-241a-4e46-8471-6ad9987f3934@github.com> Message-ID: <9M0RzM4W-7LDjgy7_HK8h5eW4jxCaHWlba4xA5XPgxs=.cddfd47c-6aa8-40f7-9a3b-541f4c05cb95@github.com> On Tue, 25 Mar 2025 06:41:13 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this patch? >> We can let min/max to use cmove if Zicond is supported rather than a branch. >> At this same time, this patch also simplify the code of min/max. >> >> Thanks! > > Marked as reviewed by fyang (Reviewer). Thank you @RealFYang @robehn ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24153#issuecomment-2750633211 From rcastanedalo at openjdk.org Tue Mar 25 09:38:24 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 25 Mar 2025 09:38:24 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> Message-ID: <4URBuBBraSwcBpgEKaM-oyXREnlb7IO32kWXEgR5jec=.15f2031e-443e-4bf6-91f3-ea6d4a5718d7@github.com> On Mon, 24 Mar 2025 15:33:34 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Extend example with offset register mask The overhead of the current version of this changeset (commit [fbfddb2](https://github.com/openjdk/jdk/pull/20404/commits/fbfddb292a0248ed187d28b30f3aa655fc28c0e5)) in terms of increased C2 compilation time for DaCapo 23 is of around 1.5%. This is, in my opinion, still within the acceptable range. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2750648679 From qxing at openjdk.org Tue Mar 25 09:54:12 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 25 Mar 2025 09:54:12 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 08:03:57 GMT, Emanuel Peter wrote: >> Hi all, >> >> This patch has now passed all GHA tests and is ready for further reviews. >> >> If there are any other suggestions for this PR, please let me know. >> >> Thanks! > > @MaxXSoft I'm not an expert with SafePoints, but I'd be willing to review if you answer my questions above, and maybe some more I'll have later ;) > > One question I just had now: Assume we now remove the SafePoint because there is that other call above. But what if later we inline that call - do we still have some SafePoint after that in the loop? Hi @eme64 , Thanks for your previous reviews! Regarding your first question: > Though I am struggling to understand all of the comments... > > is that talking about a call in the outer loop, or the inner loop? This comment is really confusing. I had to review the code in this file multiple times to understand it properly. Here's my understanding: If there's a call in a loop that will definitely be executed and will perform a safepoint poll (`guaranteed_safepoint`), then we can remove all other non-call safepoints except for this call. However, in some cases, a non-call safepoint (ncsfpt) removed from an inner loop might also be part of an outer loop. If this removed safepoint happens to be the only one in the outer loop, it could cause problems. We should avoid this situation. The purpose of this method is to examine all loops in the loop tree, and mark ncsfpts that shouldn't be removed. Note that this method DOESN'T actually remove any safepoints; it only marks them. The actual removal will happen later, checking whether each safepoint has been marked by outer loops before deletion. The `C)` mentioned in the comment refers to: if a loop already contains a call that will definitely be executed, there's no need to mark any other safepoints. This is because no matter how many safepoints are removed from its inner loops, even if those safepoints are part of the outer loop, that call will still perform a safepoint poll. Therefore, the conclusion is that the call mentioned in `C)` refers to a call in the outer loop. The third question: > Assume we now remove the SafePoint because there is that other call above. But what if later we inline that call - do we still have some SafePoint after that in the loop? On the one hand, this situation won't occur in the current `Compile::Optimize` process. The `Optimize` method will always complete all inlining before performing loop optimization, as seen in: https://github.com/openjdk/jdk/blob/3d3b7820371058b40f2e694536c98aa3900abb5f/src/hotspot/share/opto/compile.cpp#L2363-L2365 On the other hand, this patch only fixes the issue where phase ideal loop cannot remove redundant ncsfpts in certain situations. In other cases, such as when ncsfpt appears before a call, C2 can still remove the ncsfpt even without this patch, resulting in a loop with no safepoints except for the call. If the issue you mentioned exists, this risk should have been present before, but so far, we haven't encountered this problem in any test or real-world scenarios. That's all. I'm still working on the second question and will add a comment once I figure it out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2750694862 From epeter at openjdk.org Tue Mar 25 10:09:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 10:09:27 GMT Subject: RFR: 8344942: Template-Based Testing Framework Message-ID: **Goal** We want to generate Java source code: - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). **How to get started** When reviewing, please start by looking at: https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. And then for a "tutorial", look at: `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` It shows these features: - The `body` of a Template is essentially a list of `Token`s that are concatenated. - Templates can be nested: a `TemplateWithArgs` is also a `Token`. - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. - The use of recursive templates, and `fuel` to limit the recursion. - `Name`s: useful to register field and variable names in code scopes. Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 For a better experience, you may want to generate the `javadocs`: `javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework` **History** @TobiHartmann and I have played with code generators for a while, and have had the dream of doing that in a more principled way. And to hopefully make it more accessible for other VM developers, with the goal of improving test coverage. @TobiHartmann started with an initial prototype. @tobiasholenstein took over, and worked with our intern @maasaid. Their approach was to take old tests and templetize them ([draft PR](https://github.com/openjdk/jdk/pull/22358)). Their templates had holes for replacements, and a `$` prefix for variables name replacement. Once @tobiasholenstein left, I took over the project, and focused on nested template use, where templates could be passed arguments. I kept it string based, which worked, but the resulting syntax was a little cryptic. Debugging was difficult, as I had to produce custom stack traces, print available variables, etc. [Here is the string syntax based version](https://github.com/openjdk/jdk/pull/22483). @theoweidmannoracle reviewed my draft, and had some very good feedback. He was frustrated with the complexity, and also the string syntax. Over several iterations, we kept most of the complexity (because code generation is a little complex), but changed the approach to use Java Generics. I took one of his prototypes and fleshed it out with all the other necessary features. **Related Work** There is lots of related work, for test generation: - `Verify.java`: verify results. - `Generators.java`: generate random inputs from "interesting distributions". - `Compile Framework`: take string source code, compile and class-load it for execution. - [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) Template-Framework Library: future work. Provide lots of useful templates, generate Expressions and nested Statements, etc. [Here a previous PR where I am experimenting with different Library features.](https://github.com/openjdk/jdk/pull/23418). I decided to already include the Hooks from the library, as they are useful to have in the tests and examples already. ------------- Commit messages: - fix tests - whitespace - whitespace - fix whitespace - JDK-8344942 Changes: https://git.openjdk.org/jdk/pull/24217/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344942 Stats: 3920 lines in 24 files changed: 3920 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From azafari at openjdk.org Tue Mar 25 10:13:16 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 25 Mar 2025 10:13:16 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: <-896YAfp9geObPIJ1Ek3KNfnlzeAglqPGFdCvcFlI8Q=.fec8f0fb-6744-4961-ac24-da7e97a9dfea@github.com> On Mon, 24 Mar 2025 18:19:29 GMT, Vladimir Kozlov wrote: > Why it is UB for signed left shift for negative value? For example in signed short int x = -32768; signed short int y = x << 1; the value of `y` would be `0`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2750746885 From rraj at openjdk.org Tue Mar 25 10:14:20 2025 From: rraj at openjdk.org (Rohit Arul Raj) Date: Tue, 25 Mar 2025 10:14:20 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 15:28:12 GMT, Rohit Arul Raj wrote: >> In JDK-8309130, Array sort was optimized using AVX512 SIMD instructions for x86_64. Currently, this optimization has been disabled for AMD Zen 4 [JDK-8317763] due to bad performance of compressstoreu. >> Ref: https://www.reddit.com/r/java/comments/171t5sj/heads_up_openjdk_implementation_of_avx512_based/. >> >> This patch enables Zen 4 to pick optimized AVX2 version of SIMD sort and Zen 5 picks the AVX512 version. >> >> JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. >> >> Attaching ArraySort performance data for Zen4 & Zen5. >> [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) >> [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) > > Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: > > create a separate method to check for cpu's supporting avx512 version of simd sort @vamsi-parasa : Could you please review or provide feedback on this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24053#issuecomment-2750749018 From thartmann at openjdk.org Tue Mar 25 10:18:43 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 25 Mar 2025 10:18:43 GMT Subject: RFR: 8352866: TestLogJIT.java runs wrong test class Message-ID: Fixed wrong test class in `@run` statement and fixed comment style in three unrelated tests. Best regards, Tobias ------------- Commit messages: - 8352866: TestLogJIT.java runs wrong test class Changes: https://git.openjdk.org/jdk/pull/24221/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24221&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352866 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24221.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24221/head:pull/24221 PR: https://git.openjdk.org/jdk/pull/24221 From duke at openjdk.org Tue Mar 25 10:19:23 2025 From: duke at openjdk.org (Marc Chevalier) Date: Tue, 25 Mar 2025 10:19:23 GMT Subject: Integrated: 8347459: C2: missing transformation for chain of shifts/multiplications by constants In-Reply-To: References: Message-ID: <0ZhUMd6V0nkGTRD0wM8Bo7qROnWI5EAqkk_GrIbLjHs=.789f1891-50e6-4431-a50f-4999fa263551@github.com> On Fri, 21 Feb 2025 15:57:30 GMT, Marc Chevalier wrote: > This collapses double shift lefts by constants in a single constant: (x << con1) << con2 => x << (con1 + con2). Care must be taken in the case con1 + con2 is bigger than the number of bits in the integer type. In this case, we must simplify to 0. > > Moreover, the simplification logic of the sign extension trick had to be improved. For instance, we use `(x << 16) >> 16` to convert a 32 bits into a 16 bits integer, with sign extension. When storing this into a 16-bit field, this can be simplified into simple `x`. But in the case where `x` is itself a left-shift expression, say `y << 3`, this PR makes the IR looks like `(y << 19) >> 16` instead of the old `((y << 3) << 16) >> 16`. The former logic didn't handle the case where the left and the right shift have different magnitude. In this PR, I generalize this simplification to cases where the left shift has a larger magnitude than the right shift. This improvement was needed not to miss vectorization opportunities: without the simplification, we have a left shift and a right shift instead of a single left shift, which confuses the type inference. > > This also works for multiplications by powers of 2 since they are already translated into shifts. > > Thanks, > Marc This pull request has now been integrated. Changeset: bdcac986 Author: Marc Chevalier Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/bdcac98673a2250f608bdf244e17578eecb30fbe Stats: 526 lines in 6 files changed: 509 ins; 0 del; 17 mod 8347459: C2: missing transformation for chain of shifts/multiplications by constants Reviewed-by: dfenacci, epeter ------------- PR: https://git.openjdk.org/jdk/pull/23728 From tholenstein at openjdk.org Tue Mar 25 10:21:13 2025 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 25 Mar 2025 10:21:13 GMT Subject: RFR: 8344942: Template-Based Testing Framework In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 08:31:36 GMT, Emanuel Peter wrote: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework` > > **History** > @TobiHartmann and I have played with code generators for a while, and have had the dream of doing that in a more principled way. And to hopefully... > /contributor add @TobiHartmann @tobiasholenstein @maasaid @theoweidmannoracle Nice work @eme64 ! I think I should be in the census. But @maasaid @theoweidmannoracle might not be ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2750767762 From mgronlun at openjdk.org Tue Mar 25 10:40:16 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 25 Mar 2025 10:40:16 GMT Subject: RFR: 8352696: JFR: assert(false): EA: missing memory path [v5] In-Reply-To: References: Message-ID: <2qZ6KK0j0Fx8gytx2f51ikfN3DRydC1bc14AfgEGBhM=.68348a98-14a2-46fa-9f33-b9a25b4b283b@github.com> On Tue, 25 Mar 2025 08:25:34 GMT, Tobias Hartmann wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> simplified test > > Looks good. Thank you @TobiHartmann, @shipilev and @vnkozlov for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24192#issuecomment-2750817188 From mgronlun at openjdk.org Tue Mar 25 10:40:17 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 25 Mar 2025 10:40:17 GMT Subject: Integrated: 8352696: JFR: assert(false): EA: missing memory path In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 11:35:03 GMT, Markus Gr?nlund wrote: > Greetings, > > In debug builds, we can sometimes assert because C2's Escape Analysis does not recognize a pattern where one input of memory Phi node is a MergeMem node, and another is a RAW store. > > This pattern is created by the jdk.jfr.internal.JVM.commit() intrinsic, which is inlined because of inlining JFR events, for example jdk.VirtualThreadStart. > > As a result, EA complains about a strange memory graph. > > Testing: jdk_jfr > > Thanks > Markus This pull request has now been integrated. Changeset: 721ef767 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/721ef76738a2145bdff9b8534d3512282c61db8b Stats: 101 lines in 2 files changed: 94 ins; 2 del; 5 mod 8352696: JFR: assert(false): EA: missing memory path Reviewed-by: thartmann, shade, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24192 From rcastanedalo at openjdk.org Tue Mar 25 10:52:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 25 Mar 2025 10:52:11 GMT Subject: RFR: 8352866: TestLogJIT.java runs wrong test class In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 10:13:03 GMT, Tobias Hartmann wrote: > Fixed wrong test class in `@run` statement and fixed comment style in three unrelated tests. > > Best regards, > Tobias Looks good, and trivial. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24221#pullrequestreview-2713268276 From chagedorn at openjdk.org Tue Mar 25 11:21:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Mar 2025 11:21:13 GMT Subject: RFR: 8352866: TestLogJIT.java runs wrong test class In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 10:13:03 GMT, Tobias Hartmann wrote: > Fixed wrong test class in `@run` statement and fixed comment style in three unrelated tests. > > Best regards, > Tobias Good catch! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24221#pullrequestreview-2713349354 From epeter at openjdk.org Tue Mar 25 11:27:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 11:27:33 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects Message-ID: We should extend the functionality of Verify.checkEQ: - Allow different NaN encodings to be seen as equal (by default). - Compare VectorAPI vectors. - Compare Exceptions, and their messages. - Compare arbitrary Objects via Reflection. Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. ------------- Commit messages: - clean up test - JDK-8352869 Changes: https://git.openjdk.org/jdk/pull/24224/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352869 Stats: 656 lines in 3 files changed: 567 ins; 2 del; 87 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From thartmann at openjdk.org Tue Mar 25 11:55:16 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 25 Mar 2025 11:55:16 GMT Subject: RFR: 8352866: TestLogJIT.java runs wrong test class In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 10:13:03 GMT, Tobias Hartmann wrote: > Fixed wrong test class in `@run` statement and fixed comment style in three unrelated tests. > > Best regards, > Tobias Thanks for the reviews Roberto and Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24221#issuecomment-2751008544 From thartmann at openjdk.org Tue Mar 25 11:55:16 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 25 Mar 2025 11:55:16 GMT Subject: Integrated: 8352866: TestLogJIT.java runs wrong test class In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 10:13:03 GMT, Tobias Hartmann wrote: > Fixed wrong test class in `@run` statement and fixed comment style in three unrelated tests. > > Best regards, > Tobias This pull request has now been integrated. Changeset: 67c44052 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/67c4405250f93a1188c03bf336db160f77a10c7f Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod 8352866: TestLogJIT.java runs wrong test class Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24221 From chagedorn at openjdk.org Tue Mar 25 12:01:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Mar 2025 12:01:27 GMT Subject: RFR: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away [v3] In-Reply-To: References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Tue, 25 Mar 2025 08:50:08 GMT, Christian Hagedorn wrote: >> The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). >> >> Note that most of the line changes are from new tests. >> >> ### The Problem >> There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. >> >> #### An Inner Loop without Parse Predicates >> The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) >> >> We only have Parse Predicates for the outer loop. Why? >> >> Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: >> >> ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) >> >> In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. >> >> For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. >> >> #### Predicates from a Folded Loop End up at Another Loop >> As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge methods > - Merge branch 'master' into JDK-8350579 > - Small things > - Fix test comments > - New approach: Marking unrelated Template Assertion Predicates outside of IGVN by storing a reference to the loop it originally was created for. > - Merge branch 'master' into JDK-8350579 > - Revert fix completely > - 8350579: Remove Template Assertion Predicates belonging to a > loop once it is folded away during IGVN Testing looked good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23823#issuecomment-2751021365 From chagedorn at openjdk.org Tue Mar 25 12:01:28 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Mar 2025 12:01:28 GMT Subject: Integrated: 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away In-Reply-To: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> References: <5Nbgi31ds2bXEF3Uc9AL5DyAOUdmme6DCdvly0aY-60=.92863b9e-00b1-4894-87d1-1f460c8d5b20@github.com> Message-ID: On Thu, 27 Feb 2025 13:07:46 GMT, Christian Hagedorn wrote: > The patch fixes the issue of creating an Initialized Assertion Predicate at a loop X from a Template Assertion Predicate that was originally created for a loop Y. Using the unrelated loop values from loop Y for the Initialized Assertion Predicate will let it fail during runtime and we execute a `halt` instruction. This was originally reported with [JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428). > > Note that most of the line changes are from new tests. > > ### The Problem > There are multiple test cases triggering the same problem. In the following, when referring to "the test case", I'm referring to `testTemplateAssertionPredicateNotRemovedHalt()` which was written from scratch and contains more detailed comments explaining how we end up with executing a `Halt` node in more details. > > #### An Inner Loop without Parse Predicates > The graph in `testTemplateAssertionPredicateNotRemovedHalt()` looks like this after creating `LoopNodes` for the outer `for` and inner `while (true)` loop: > > ![image](https://github.com/user-attachments/assets/7ac60e35-0b7e-4f04-b9dd-6eb8c8654a15) > > We only have Parse Predicates for the outer loop. Why? > > Before beautify loop, we have the following region which merges multiple backedges - the one from the `for` loop and the one from the `while (true)` loop: > > ![image](https://github.com/user-attachments/assets/7895161d-5ac1-46d6-93fe-5ab90ef24ab9) > > In `IdealLoopTree::merge_many_backedges()`, we notice that the hottest backedge is hot enough such that it is worth to have a separate merge point region for the inner and outer loop. We set everything up and eventually in `IdealLoopTree::split_outer_loop()`, we create a second `LoopNode`. > > For this inner `LoopNode`, we cannot set up `Parse Predicates` with the same UCTs as used for the outer loop. It would be incorrect when taking the trap to re-execute the inner and outer loop again while having already executed some of the outer loop's iterations. Thus, we get the graph shape with back-to-back `LoopNodes` as shown above. > > #### Predicates from a Folded Loop End up at Another Loop > As described in the previous section, we have an inner and outer `LoopNode` while the inner does not have Parse Predicates. In a series of events (see test case comments for more details), we first hoist a range check out of the outer loop during Loop Predication with a Template Assertion Predicate. Then, we fold the outer loop away because we find that it is only running for a single iteration and the bac... This pull request has now been integrated. Changeset: c953e0ed Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/c953e0ede17aed9b80a637f1ffce90b2ea54ae21 Stats: 700 lines in 9 files changed: 572 ins; 44 del; 84 mod 8350579: Remove Template Assertion Predicates belonging to a loop once it is folded away Reviewed-by: epeter, roland ------------- PR: https://git.openjdk.org/jdk/pull/23823 From hgreule at openjdk.org Tue Mar 25 13:26:13 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 25 Mar 2025 13:26:13 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v3] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 16:17:00 GMT, Emanuel Peter wrote: >> @eme64 could you take another look? Thanks! > > @SirYwell The code now looks really good, I launched some tests. Please ping me again in a day for the results! Thanks @eme64. Are the results in already? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23851#issuecomment-2751249884 From bulasevich at openjdk.org Tue Mar 25 13:58:16 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 25 Mar 2025 13:58:16 GMT Subject: RFR: 8352426: RelocIterator should correctly handle nullptr address of relocation data In-Reply-To: <_cscF51ZEkzKYPWbctERsGB7PWrY0vP3LTpO_jCAhQs=.d9af8eba-0033-4f0f-8b54-3441d863b4f8@github.com> References: <_cscF51ZEkzKYPWbctERsGB7PWrY0vP3LTpO_jCAhQs=.d9af8eba-0033-4f0f-8b54-3441d863b4f8@github.com> Message-ID: On Tue, 25 Mar 2025 01:57:48 GMT, Dean Long wrote: > instead of the more modern range-for loop @dean-long Let's look at this code. Would it be better with a for loop? RelocIterator iter(nm, instruction_address(), next_instruction_address()); while (iter.next()) { if (iter.type() == relocInfo::oop_type) { oop* oop_addr = iter.oop_reloc()->oop_addr(); *oop_addr = cast_to_oop(x); break; } else if (iter.type() == relocInfo::metadata_type) { Metadata** metadata_addr = iter.metadata_reloc()->metadata_addr(); *metadata_addr = (Metadata*)x; break; } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/24203#issuecomment-2751347913 From bulasevich at openjdk.org Tue Mar 25 14:03:13 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 25 Mar 2025 14:03:13 GMT Subject: RFR: 8352426: RelocIterator should correctly handle nullptr address of relocation data In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:04:04 GMT, Boris Ulasevich wrote: > This is a follow-up to the recent #24102 patch. It addresses an issue where RelocIterator may receive a nullptr as the relocation table address. This change can also serve as an independent fix for JDK-8352112. > > RelocIterator::initialize() and RelocIterator::next() perform decrement/increment operations on an internal relocaction pointer. > If nm->relocation_begin() returns nullptr, this results in undefined behavior, as pointer arithmetic on nullptr is prohibited by the C++ Standard. > > Instead of introducing a null-check (which would add overhead in RelocIterator::next(), a performance-sensitive path), we initialize _current with a dummy static variable. This pointer is never dereferenced, so its actual value is not important - it just serves to avoid undefined behavior. > > RelocIterator::RelocIterator constructor can initialize _current pointer as well. However, in that place we have an assert to ensure that nullptr value is not allowed, and it seems we do not need to apply dummy value there. > > Testing: > > The fix has been verified against the failure in JDK-8352112. The issue no longer reproduces with this patch, regardless of whether the original fix from #24102 is applied. This change addresses the bug reported in JDK-8352426 by ensuring that a nullptr relocation table does not result in undefined behavior. RelocIterator isn?t a simple iterator - it encapsulates a variety of functions beyond just iteration. Reworking the API to support a range-based for loop would require a significant redesign of its interface and behavior. In my opinion, such a rework is beyond the scope of JDK-8352426. If the goal is to modernize the RelocIterator API, including support for range-based for loops, we should either explicitly reformulate JDK-8352426 to include that broader scope, or better yet, create a separate JBS issue to track that as an independent refactoring. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24203#issuecomment-2751364575 From epeter at openjdk.org Tue Mar 25 14:06:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 14:06:14 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v3] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 14:00:27 GMT, Hannes Greule wrote: >> subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. >> >> For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: >> - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway >> - For other values, the msb is flipped, flipping twice results in the original value again. >> >> Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. >> >> One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like >> >> int v = 0; >> for (int datum : data) { >> v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); >> } >> return v; >> >> was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > add tests for ReverseBytesS/ReverseBytesUS @SirYwell The tests are passing :green_circle: Thank you for all the work, especially for writing all the tests ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23851#pullrequestreview-2713903316 From hgreule at openjdk.org Tue Mar 25 14:31:12 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 25 Mar 2025 14:31:12 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v3] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 14:00:27 GMT, Hannes Greule wrote: >> subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. >> >> For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: >> - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway >> - For other values, the msb is flipped, flipping twice results in the original value again. >> >> Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. >> >> One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like >> >> int v = 0; >> for (int datum : data) { >> v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); >> } >> return v; >> >> was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > add tests for ReverseBytesS/ReverseBytesUS Great! Do I need another review or can we integrate? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23851#issuecomment-2751454546 From rrich at openjdk.org Tue Mar 25 14:48:25 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 25 Mar 2025 14:48:25 GMT Subject: RFR: 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes [v2] In-Reply-To: <1ZrSc1xlXIf0OoyrUSCaTJUswul25Z8Ha0_VT1W9Ejg=.efa1f984-a441-488f-829c-8b8b94b57af0@github.com> References: <1ZrSc1xlXIf0OoyrUSCaTJUswul25Z8Ha0_VT1W9Ejg=.efa1f984-a441-488f-829c-8b8b94b57af0@github.com> Message-ID: On Fri, 21 Mar 2025 11:04:32 GMT, David Linus Briemann wrote: >> VectorCastL2X was not added due to bad performance and thus the bit count instructions are only vectorized for `int` but not for `long`. > > David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: > > disable TestVectorPopcountVectorLong on power again Looks good. Cheers, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24064#pullrequestreview-2714053494 From epeter at openjdk.org Tue Mar 25 14:51:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 14:51:18 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v3] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 14:28:08 GMT, Hannes Greule wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> add tests for ReverseBytesS/ReverseBytesUS > > Great! Do I need another review or can we integrate? @SirYwell Thanks for asking. We generally want to have 2 reviews for Compiler changes :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23851#issuecomment-2751522106 From duke at openjdk.org Tue Mar 25 14:56:19 2025 From: duke at openjdk.org (David Linus Briemann) Date: Tue, 25 Mar 2025 14:56:19 GMT Subject: RFR: 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes [v2] In-Reply-To: <1ZrSc1xlXIf0OoyrUSCaTJUswul25Z8Ha0_VT1W9Ejg=.efa1f984-a441-488f-829c-8b8b94b57af0@github.com> References: <1ZrSc1xlXIf0OoyrUSCaTJUswul25Z8Ha0_VT1W9Ejg=.efa1f984-a441-488f-829c-8b8b94b57af0@github.com> Message-ID: On Fri, 21 Mar 2025 11:04:32 GMT, David Linus Briemann wrote: >> VectorCastL2X was not added due to bad performance and thus the bit count instructions are only vectorized for `int` but not for `long`. > > David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: > > disable TestVectorPopcountVectorLong on power again Thanks for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24064#issuecomment-2751540103 From duke at openjdk.org Tue Mar 25 14:56:20 2025 From: duke at openjdk.org (duke) Date: Tue, 25 Mar 2025 14:56:20 GMT Subject: RFR: 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes [v2] In-Reply-To: <1ZrSc1xlXIf0OoyrUSCaTJUswul25Z8Ha0_VT1W9Ejg=.efa1f984-a441-488f-829c-8b8b94b57af0@github.com> References: <1ZrSc1xlXIf0OoyrUSCaTJUswul25Z8Ha0_VT1W9Ejg=.efa1f984-a441-488f-829c-8b8b94b57af0@github.com> Message-ID: On Fri, 21 Mar 2025 11:04:32 GMT, David Linus Briemann wrote: >> VectorCastL2X was not added due to bad performance and thus the bit count instructions are only vectorized for `int` but not for `long`. > > David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: > > disable TestVectorPopcountVectorLong on power again @dbriemann Your change (at version afca9b3d7a854d8330ee0d32ad8b9751d8161a93) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24064#issuecomment-2751542679 From sviswanathan at openjdk.org Tue Mar 25 15:04:20 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 25 Mar 2025 15:04:20 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v2] In-Reply-To: References: Message-ID: <4IiXOB8F1-hsF_ulbTMZ-dlG7YWGieifLm-EiR5x24c=.aa0106aa-57b5-40ea-b515-cff933cbb2b5@github.com> On Tue, 25 Mar 2025 08:31:06 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7093: >> >>> 7091: } >>> 7092: >>> 7093: void C2_MacroAssembler::scalar_max_min_fp16(int opcode, XMMRegister dst, XMMRegister src1, XMMRegister src2, >> >> Any reason we are not doing this on lines of scalar emit_fp_min_max? For most common cases emit_fp_min_max based sequence would have much better latency. > > We don't need any blend emulation on CPUs supporting AVX512-FP16, it's specific to E-core targets. emit_fp_min_max in x86_64.ad doesn't have any blend emulation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2012311161 From duke at openjdk.org Tue Mar 25 15:19:29 2025 From: duke at openjdk.org (David Linus Briemann) Date: Tue, 25 Mar 2025 15:19:29 GMT Subject: Integrated: 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 16:09:34 GMT, David Linus Briemann wrote: > VectorCastL2X was not added due to bad performance and thus the bit count instructions are only vectorized for `int` but not for `long`. This pull request has now been integrated. Changeset: e98838f5 Author: David Linus Briemann Committer: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/e98838f58db1606f35c85ac9fcdbdf1076b6a303 Stats: 92 lines in 4 files changed: 88 ins; 0 del; 4 mod 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes Reviewed-by: mdoerr, rrich ------------- PR: https://git.openjdk.org/jdk/pull/24064 From qamai at openjdk.org Tue Mar 25 15:46:17 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 25 Mar 2025 15:46:17 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v46] In-Reply-To: References: Message-ID: <0PruHb9eKYLiVaXhVqUPXu_UjkiWDuEQXb_0IRiCMkA=.662fdd8c-a12e-4762-b960-82fccb46f28e@github.com> > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 60 commits: - Merge branch 'master' into unsignedbounds - reviews - Merge branch 'master' into unsignedbounds - refine comments - Merge branch 'master' into unsignedbounds - Merge branch 'master' into unsignedbounds - harden SimpleCanonicalResult - number lemmas - include - clean up intn_t - ... and 50 more: https://git.openjdk.org/jdk/compare/e98838f5...9ca80236 ------------- Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=45 Stats: 2353 lines in 13 files changed: 1789 ins; 328 del; 236 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From psandoz at openjdk.org Tue Mar 25 15:50:15 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 25 Mar 2025 15:50:15 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 10:11:13 GMT, Rohit Arul Raj wrote: > @vamsi-parasa : Could you please review or provide feedback on this patch? Srinivas does not currently have openjdk reviewer [status](https://openjdk.org/census#sparasa). @iwanowww if you have a few moments can you help review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24053#issuecomment-2751722836 From rehn at openjdk.org Tue Mar 25 15:55:37 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 25 Mar 2025 15:55:37 GMT Subject: RFR: 8352897: RISC-V: Change default value for UseConservativeFence Message-ID: Hi, please consider. gcc have stopped emitting io-bits for fences since 13. And we need to use newer gcc due to other compiler bugs. Therefore there is no point in letting JIT emit io-bits when the runtime don't have them. Thanks, Robbin ------------- Commit messages: - Default false Changes: https://git.openjdk.org/jdk/pull/24233/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24233&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352897 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24233.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24233/head:pull/24233 PR: https://git.openjdk.org/jdk/pull/24233 From epeter at openjdk.org Tue Mar 25 16:12:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 16:12:29 GMT Subject: RFR: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:50:30 GMT, Vladimir Kozlov wrote: >> This was a fuzzer failure, which hit an assert in SuperWord: >> >> `# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop` >> >> We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible. >> >> I'm proposing two changes in this PR: >> - We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop. >> - In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`. >> >> I added 2 tests: >> - The fuzzer generated test that hits the assert before this patch. >> - An IR test that checks that we do not multiversion in a `PeelMainPost` loop case. >> >> --------------- >> >> **FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example: >> >> `test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java` >> >> With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` >> >> >> Counted Loop: N537/N176 counted [int,100),+1 (-1 iters) >> Loop: N0/N0 has_sfpt >> Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 } >> Loop: N536/N535 >> Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined >> Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt >> Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt >> Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined >> PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined >> Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined >> Poor node estimate: 306 >> 92 >> Loop: N0/N0 has_sfpt >> Loop: N307/N361 limit_check profile_predicated predicated sfpts={... > >> If reviewers thing this really should be investigated, I could file a follow-up RFE. > > Yes, please. Can you check without "strip minning" if we can eliminate this loop? @vnkozlov @chhagedorn Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24183#issuecomment-2751783316 From epeter at openjdk.org Tue Mar 25 16:12:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 16:12:31 GMT Subject: Integrated: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 08:22:57 GMT, Emanuel Peter wrote: > This was a fuzzer failure, which hit an assert in SuperWord: > > `# assert(_cl->is_multiversion_fast_loop() == (_multiversioning_fast_proj != nullptr)) failed: must find the multiversion selector IFF loop is a multiversion fast loop` > > We had a fast main loop, but it could not find the `multiversion_if`. The reason was that the loop was a `PeelMainPost` loop, i.e. there is no pre-loop but only a single peeled iteration. This makes the pattern matching from main-loop via pre-loop to `multiversion_if` impossible. > > I'm proposing two changes in this PR: > - We must check `peel_only`, to see if we are in a `PeelMainPost` or `PreMainPost` case, and only do multiversioning if we know that there will be a pre-loop. > - In `eliminate_useless_multiversion_if` we should already detect that a main-loop that is marked as multiversioned should be able to find its `multiversion_if`. I'm removing its multiversioning marking if we cannot find the `multiversion_if`. > > I added 2 tests: > - The fuzzer generated test that hits the assert before this patch. > - An IR test that checks that we do not multiversion in a `PeelMainPost` loop case. > > --------------- > > **FYI**: I tried to add an assert in `eliminate_useless_multiversion_if` that we must always find the `multiversion_if` from a multiversioned main loop. But there are cases where this can fail. Here an example: > > `test/hotspot/jtreg/compiler/locks/TestSynchronizeWithEmptyBlock.java` > > With flags: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` > > > Counted Loop: N537/N176 counted [int,100),+1 (-1 iters) > Loop: N0/N0 has_sfpt > Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 495 } > Loop: N536/N535 > Loop: N537/N176 counted [int,100),+1 (-1 iters) has_sfpt strip_mined > Loop: N379/N383 limit_check profile_predicated predicated counted [int,int),+1 (4 iters) pre rc has_sfpt > Loop: N353/N357 counted [int,1000),+1 (4 iters) post rc has_sfpt > Multiversion Loop: N537/N176 counted [int,100),+1 (100 iters) has_sfpt strip_mined > PreMainPost Loop: N537/N176 counted [int,100),+1 (100 iters) multiversion_fast has_sfpt strip_mined > Unroll 2 Loop: N537/N176 counted [int,100),+1 (100 iters) main multiversion_fast has_sfpt strip_mined > Poor node estimate: 306 >> 92 > Loop: N0/N0 has_sfpt > Loop: N307/N361 limit_check profile_predicated predicated sfpts={ 182 } > Loop: N556/N557 sfpts={ 559 } > Loop: N552/N554 counted... This pull request has now been integrated. Changeset: c856b342 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/c856b3425a70d2aecb6c5c44da36396a5d74b00d Stats: 139 lines in 5 files changed: 134 ins; 0 del; 5 mod 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24183 From kxu at openjdk.org Tue Mar 25 16:21:24 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 25 Mar 2025 16:21:24 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:14:04 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> add micro benchmark > > This looks really interesting! > > I see that you are doing some special pattern matching. I wonder if it might be worth generalizing the algorithm, to search through an arbitrary "tree" of additions, collect all "leaves" of (`variable * multiplier`), sort by `variable`, and compute new additions for each `variable`. What do you think? @eme64 Could you please take a look at this if you have some time? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2751814976 From luhenry at openjdk.org Tue Mar 25 16:23:18 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 25 Mar 2025 16:23:18 GMT Subject: RFR: 8352897: RISC-V: Change default value for UseConservativeFence In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 15:49:40 GMT, Robbin Ehn wrote: > Hi, please consider. > > gcc have stopped emitting io-bits for fences since 13. > And we need to use newer gcc due to other compiler bugs. > Therefore there is no point in letting JIT emit io-bits when the runtime don't have them. > > Thanks, Robbin Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24233#pullrequestreview-2714400895 From rehn at openjdk.org Tue Mar 25 16:28:18 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 25 Mar 2025 16:28:18 GMT Subject: RFR: 8352897: RISC-V: Change default value for UseConservativeFence In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 16:20:46 GMT, Ludovic Henry wrote: >> Hi, please consider. >> >> gcc have stopped emitting io-bits for fences since 13. >> And we need to use newer gcc due to other compiler bugs. >> Therefore there is no point in letting JIT emit io-bits when the runtime don't have them. >> >> Thanks, Robbin > > Marked as reviewed by luhenry (Committer). Thank you @luhenry ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24233#issuecomment-2751837216 From epeter at openjdk.org Tue Mar 25 16:38:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 16:38:22 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v9] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 13:25:34 GMT, Roland Westrelin wrote: >> Ah, right, I see that you already mentioned that above. Should we then problem list the test with this change? Testing looks clean otherwise. > >> Ah, right, I see that you already mentioned that above. Should we then problem list the test with this change? Testing looks clean otherwise. > > https://github.com/openjdk/jdk/pull/23465 is a fix for JDK-8341976 and given it's much simpler than this change, I suppose it will get in first. @rwestrel Is this ready for review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2751865119 From epeter at openjdk.org Tue Mar 25 16:50:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 16:50:07 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v3] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 08:34:27 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 33: > 31: * @library /test/lib / > 32: * @summary Add special case handling for Float16.max/min x86 backend > 33: * @requires (os.simpleArch == "x64" & vm.cpu.features ~= ".*avx512_fp16.*" & vm.cpu.features ~= ".*avx512bw.*" & vm.cpu.features ~= ".*avx512vl.*") Can you please add restrictions to the IR rules, so that the test can run on all platforms? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2012521761 From kvn at openjdk.org Tue Mar 25 16:54:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Mar 2025 16:54:14 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 13:46:06 GMT, Evgeny Astigeevich wrote: >> This benchmark is used to check performance impact of the code cache being sparse. >> >> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. >> >> Results: code region size 2M (2097152) bytes >> - Intel Xeon Platinum 8259CL >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | >> |--- |--- |--- |--- |--- |--- |--- | >> |128 |1 |128 |19.577 |0.619 |us/op | | >> |128 |32 |4 |22.968 |0.314 |us/op |17.30% | >> |128 |48 |3 |22.245 |0.388 |us/op |13.60% | >> |128 |64 |2 |23.874 |0.84 |us/op |21.90% | >> |128 |80 |2 |23.786 |0.231 |us/op |21.50% | >> |128 |96 |1 |26.224 |1.16 |us/op |34% | >> |128 |112 |1 |27.028 |0.461 |us/op |38.10% | >> |256 |1 |256 |47.43 |1.146 |us/op | | >> |256 |32 |8 |63.962 |1.671 |us/op |34.90% | >> |256 |48 |5 |63.396 |0.247 |us/op |33.70% | >> |256 |64 |4 |66.604 |2.286 |us/op |40.40% | >> |256 |80 |3 |59.746 |1.273 |us/op |26% | >> |256 |96 |3 |63.836 |1.034 |us/op |34.60% | >> |256 |112 |2 |63.538 |1.814 |us/op |34% | >> |512 |1 |512 |172.731 |4.409 |us/op | | >> |512 |32 |16 |206.772 |6.229 |us/op |19.70% | >> |512 |48 |11 |215.275 |2.228 |us/op |24.60% | >> |512 |64 |8 |212.962 |2.028 |us/op |23.30% | >> |512 |80 |6 |201.335 |12.519 |us/op |16.60% | >> |512 |96 |5 |198.133 |6.502 |us/op |14.70% | >> |512 |112 |5 |193.739 |3.812 |us/op |12.20% | >> |768 |1 |768 |325.154 |5.048 |us/op | | >> |768 |32 |24 |346.298 |20.196 |us/op |6.50% | >> |768 |48 |16 |350.746 |2.931 |us/op |7.90% | >> |768 |64 |12 |339.445 |7.927 |us/op |4.40% | >> |768 |80 |10 |347.408 |7.355 |us/op |6.80% | >> |768 |96 |8 |340.983 |3.578 |us/op |4.90% | >> |768 |112 |7 |353.949 |2.98 |us/op |8.90% | >> |1024 |1 |1024 |368.352 |5.961 |us/op | | >> |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | >> |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | >> |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | >> |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | >> |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | >> |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | >> >> - AArch64 Neoverse N1 >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff |... > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Separate active methods and method calling them with 128Mb dummy space I think it is fine to accept this benchmark. It could be useful. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23831#pullrequestreview-2714494331 From epeter at openjdk.org Tue Mar 25 16:57:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Mar 2025 16:57:08 GMT Subject: RFR: 8352316: More MergeStoreBench In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: On Wed, 19 Mar 2025 06:02:17 GMT, Shaojin Wen wrote: >> Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works > > The performance numbers show that putNull_unsafePutInt and putNull_utf16_unsafePutLong perform more than 10 times better. It can be seen that MergeStore is very suitable for these scenarios. > > # Scipt > > git remote add wenshao git at github.com:wenshao/jdk.git > git fetch wenshao > git clone 23dba8c52454ae90eab4cb1b0a168c6e7249dd38 > make test TEST="micro:vm.compiler.MergeStoreBench.putNull" > > > ## 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC? Genoa) > > Benchmark Mode Cnt Score Error Units > MergeStoreBench.putNull_arraycopy avgt 5 6715.041 ? 18.765 ns/op > MergeStoreBench.putNull_getBytes avgt 5 5880.725 ? 12.261 ns/op > MergeStoreBench.putNull_getChars avgt 5 11972.642 ? 24.990 ns/op > MergeStoreBench.putNull_string_builder avgt 5 15643.372 ? 4526.932 ns/op > MergeStoreBench.putNull_unsafePutInt avgt 5 280.570 ? 0.669 ns/op > MergeStoreBench.putNull_utf16_arrayCopy avgt 5 13053.191 ? 24.954 ns/op > MergeStoreBench.putNull_utf16_string_builder avgt 5 16349.747 ? 5029.799 ns/op > MergeStoreBench.putNull_utf16_unsafePutLong avgt 5 579.580 ? 0.710 ns/op > > > > ## 3. aliyun_ecs_c8i_x64 (CPU Intel?Xeon?Emerald Rapids) > > Benchmark Mode Cnt Score Error Units > MergeStoreBench.putNull_arraycopy avgt 5 8029.622 ? 60.856 ns/op > MergeStoreBench.putNull_getBytes avgt 5 7444.635 ? 39.552 ns/op > MergeStoreBench.putNull_getChars avgt 5 16657.442 ? 147.301 ns/op > MergeStoreBench.putNull_string_builder avgt 5 23008.159 ? 6143.167 ns/op > MergeStoreBench.putNull_unsafePutInt avgt 5 235.302 ? 2.004 ns/op > MergeStoreBench.putNull_utf16_arrayCopy avgt 5 18330.317 ? 142.242 ns/op > MergeStoreBench.putNull_utf16_string_builder avgt 5 25843.593 ? 7089.392 ns/op > MergeStoreBench.putNull_utf16_unsafePutLong avgt 5 1860.076 ? 16.703 ns/op > > > ## 4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710) > > Benchmark Mode Cnt Score Error Units > MergeStoreBench.putNull_arraycopy avgt 5 8114.176 ? 36.685 ns/op > MergeStoreBench.putNull_getBytes avgt 5 6171.538 ? 5.845 ns/op > MergeStoreBench.putNull_getChars avgt 5 10432.681 ? 26.401 ns/op > MergeStoreBench.putNull_string_builder avgt 5 21238.753 ? 1428.244 n... @wenshao Do you have any insight from this benchmark? What was your motivation for it? I also wonder if an IR test for some of the cases would be helpful. IR tests give us more info about what the compiler produced, and if there is a change in VM behaviour the IR test catches it in regular testing. Benchmarks are not run regularly, and regressions would therefore not be caught. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2751928466 From kvn at openjdk.org Tue Mar 25 17:25:15 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Mar 2025 17:25:15 GMT Subject: RFR: 8352426: RelocIterator should correctly handle nullptr address of relocation data In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 14:00:20 GMT, Boris Ulasevich wrote: >> This is a follow-up to the recent #24102 patch. It addresses an issue where RelocIterator may receive a nullptr as the relocation table address. This change can also serve as an independent fix for JDK-8352112. >> >> RelocIterator::initialize() and RelocIterator::next() perform decrement/increment operations on an internal relocaction pointer. >> If nm->relocation_begin() returns nullptr, this results in undefined behavior, as pointer arithmetic on nullptr is prohibited by the C++ Standard. >> >> Instead of introducing a null-check (which would add overhead in RelocIterator::next(), a performance-sensitive path), we initialize _current with a dummy static variable. This pointer is never dereferenced, so its actual value is not important - it just serves to avoid undefined behavior. >> >> RelocIterator::RelocIterator constructor can initialize _current pointer as well. However, in that place we have an assert to ensure that nullptr value is not allowed, and it seems we do not need to apply dummy value there. >> >> Testing: >> >> The fix has been verified against the failure in JDK-8352112. The issue no longer reproduces with this patch, regardless of whether the original fix from #24102 is applied. > > This change addresses the bug reported in JDK-8352426 by ensuring that a nullptr relocation table does not result in undefined behavior. > > RelocIterator isn?t a simple iterator - it encapsulates a variety of functions beyond just iteration. Reworking the API to support a range-based for loop would require a significant redesign of its interface and behavior. In my opinion, such a rework is beyond the scope of JDK-8352426. If the goal is to modernize the RelocIterator API, including support for range-based for loops, we should either explicitly reformulate JDK-8352426 to include that broader scope, or better yet, create a separate JBS issue to track that as an independent refactoring. What do you think? I agree with @bulasevich that rewriting RelocIterator is out of scope of this RFE. And I am not sure that we should rewrite it at all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24203#issuecomment-2752007234 From kvn at openjdk.org Tue Mar 25 17:35:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Mar 2025 17:35:13 GMT Subject: RFR: 8352426: RelocIterator should correctly handle nullptr address of relocation data In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:04:04 GMT, Boris Ulasevich wrote: > This is a follow-up to the recent #24102 patch. It addresses an issue where RelocIterator may receive a nullptr as the relocation table address. This change can also serve as an independent fix for JDK-8352112. > > RelocIterator::initialize() and RelocIterator::next() perform decrement/increment operations on an internal relocaction pointer. > If nm->relocation_begin() returns nullptr, this results in undefined behavior, as pointer arithmetic on nullptr is prohibited by the C++ Standard. > > Instead of introducing a null-check (which would add overhead in RelocIterator::next(), a performance-sensitive path), we initialize _current with a dummy static variable. This pointer is never dereferenced, so its actual value is not important - it just serves to avoid undefined behavior. > > RelocIterator::RelocIterator constructor can initialize _current pointer as well. However, in that place we have an assert to ensure that nullptr value is not allowed, and it seems we do not need to apply dummy value there. > > Testing: > > The fix has been verified against the failure in JDK-8352112. The issue no longer reproduces with this patch, regardless of whether the original fix from #24102 is applied. I submitted out testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24203#issuecomment-2752032656 From kvn at openjdk.org Tue Mar 25 18:00:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Mar 2025 18:00:27 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: <8BMaz7Ssnwk6ywuPYfgaL1VSN7XZYBzpvMWRsG12-Eg=.441d03b6-4a0b-479b-b789-d9bfd03bd346@github.com> <0DkMl4UCWkNIUpFJKP2z4T-nP0NjJsrhbuczJLeWVHM=.64561b36-8472-4159-8d90-7cbec2b58d5d@github.com> Message-ID: On Tue, 25 Mar 2025 02:35:10 GMT, Dean Long wrote: >> @vnkozlov @dean-long >> It looks like the only way relocation can be performed correctly only at a safepoint. GC updating oops concurrently with relocation is an issue. >> The safepoint requirement will limit use cases of relocation. Relocations should not be done often and should be done in a batch. On another side, relocating at a safepoint will simplify clone code patching. We need to fix offsets in call instructions at call sites. We don't need to clear them. >> What do you think? > > If the only issue was GC updating oops concurrently, we could try using CompiledICLocker instead of forcing a safepoint. But now that I think about it, there are other issues, like the state of the entry barrier, and other GC epoch counters, and copying those consistently may require a safepoint and/or additional GC fixup logic. I think we need a GC expert to weigh in here. There may be other issues we are missing. I agree that we should do it at safepoint to avoid obvious concurrency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2012647982 From dlong at openjdk.org Tue Mar 25 18:04:18 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Mar 2025 18:04:18 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 13:37:23 GMT, Matthias Baesken wrote: > When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test > java/foreign/TestHandshake > this error/warning is reported : > > jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' > UndefinedBehaviorSanitizer:DEADLYSIGNAL > UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. > > Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); Yes, we should fix frequency calculation in a separate follow up. I was trying to better understand what is going on, to see what value to clamp the result to. If this can only happens for infinite loops, then it seems like clamping to_pct to 100 is the right answer. And as Tom said, if we get it wrong, we just get a less good layout. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23962#issuecomment-2752109356 From kvn at openjdk.org Tue Mar 25 18:05:18 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Mar 2025 18:05:18 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 23:00:36 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Relocate nmethod at safepoint > - Fix windows build Currently the relocation is triggered only by WB API. Which limits testing to new tests. Which may not be enough. Consider adding stress flag to copy random nmethod when we hit any safepoint. ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23573#pullrequestreview-2714713306 From dlong at openjdk.org Tue Mar 25 18:32:21 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Mar 2025 18:32:21 GMT Subject: RFR: 8352426: RelocIterator should correctly handle nullptr address of relocation data In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:04:04 GMT, Boris Ulasevich wrote: > This is a follow-up to the recent #24102 patch. It addresses an issue where RelocIterator may receive a nullptr as the relocation table address. This change can also serve as an independent fix for JDK-8352112. > > RelocIterator::initialize() and RelocIterator::next() perform decrement/increment operations on an internal relocaction pointer. > If nm->relocation_begin() returns nullptr, this results in undefined behavior, as pointer arithmetic on nullptr is prohibited by the C++ Standard. > > Instead of introducing a null-check (which would add overhead in RelocIterator::next(), a performance-sensitive path), we initialize _current with a dummy static variable. This pointer is never dereferenced, so its actual value is not important - it just serves to avoid undefined behavior. > > RelocIterator::RelocIterator constructor can initialize _current pointer as well. However, in that place we have an assert to ensure that nullptr value is not allowed, and it seems we do not need to apply dummy value there. > > Testing: > > The fix has been verified against the failure in JDK-8352112. The issue no longer reproduces with this patch, regardless of whether the original fix from #24102 is applied. Looks good. I wasn't try to suggest that we need to rewrite RelocIterator now to support C++ iterators, but that that kind of interface could avoid the "nullptr - 1" issue, because then the iterator's begin() and end() functions would both return nullptr, and iteration would end because begin() == end(). ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24203#pullrequestreview-2714816398 From kvn at openjdk.org Tue Mar 25 18:41:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Mar 2025 18:41:08 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:18:25 GMT, Afshin Zafari wrote: > The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift and is reported as > `runtime error: left shift of negative value -25 at relocInfo.cpp:...` > > Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. > > Tests: > linux-x64-debug tier1 on a UBSAN enabled build. This code was present from the beginning of HotSpot development. Note, the situation should never happened because we will "never" have 512 MB (2Gb / sizeof(oop) in 32-bit VM, less in 64-bit VM) offset in nmethod (it is internal_word relocation). With `section_width` 2 we should never hit overflow. The smaller change is better. I am curious if Dean's simple suggestion `offset * (1<< section_width)` will indeed avoid UBSAN. I am for such fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2752199381 From kvn at openjdk.org Tue Mar 25 18:45:19 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Mar 2025 18:45:19 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:18:25 GMT, Afshin Zafari wrote: > The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift and is reported as > `runtime error: left shift of negative value -25 at relocInfo.cpp:...` > > Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. > > Tests: > linux-x64-debug tier1 on a UBSAN enabled build. May be we should wait conclusion of discussion for https://github.com/openjdk/jdk/pull/24184 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2752209587 From vlivanov at openjdk.org Tue Mar 25 19:34:22 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 25 Mar 2025 19:34:22 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 21:39:03 GMT, Quan Anh Mai wrote: >> I still have a hard time making any conclusions until I see examples. Skeleton code doesn't say much to me. >> Also, would be nice to port some existing use cases. >> >> Overall, I'd like to build more confidence in general applicability of the proposed design before committing to it. > > @iwanowww There are some examples, most of these are about x86 since that is the architecture I'm most familiar with: > > #22922 > The relative cost of multiplication to left shift and addition is different between each architecture and each data type. For example, on x86, scalar multiplication has the latency being triple of that for shift and addition, so transforming `x * 5` into `(x << 2) + x` is reasonable, while transforming `x * 13` into `(x << 3) + (x << 2) + x` is pretty questionable. However, vector multiplication is a different story, i32 vector multiplication has around 5 times the latency, and i64 vector multiplication is even more expensive. So it is preferable to be more aggressive with this transformation. The story is completely different for AArch64, so we need a completely different heuristic there. > > #22886 > This is a PR taking advantage of this PR. In general, we try to lower the vector node early to take advantage of GVN. While if we try to implement the node in code emission there is no optimization there anymore. > > Some examples that I have given regarding vector insertion and vector extraction. The idea is the same, by expanding early, we can perform idealization and GVN on them, elide redundant nodes. Note that this transformation is only on x86: `ExtractI(v, 5) -> ExtractI(ExtractVector(v, 1), 1)` because the concept of 128-bit "lane" and the fact that scalar value can only interact with 128-bit vectors only exists there. > > https://bugs.openjdk.org/browse/JDK-8345812 > The general concept of a vector rearrange is to shuffle one vector with the index from another vector. However, the underlying machine may not support such shuffles directly. In those cases, we need to emulate that shuffle with other shuffle instructions. For example, consider a shuffle of short vectors `[x0, x1, x2, x3]` and `[y0, y1, y2, y3]`. However, x86 does not have short shuffles before AVX512BW, and it has a byte shuffle, so we transform the index vector into something that when we invoke the byte shuffle using the `x` and the transformed `y`, the result would be as if we have a short shuffle instruction to begin with. This is only done early because an index vector is often used for multiple shuffles with different first operands. And we want to do it reasonably late so that we can transform other things into vector rearrange without having to deal with `VectorLoadShuffleNode`. > > https://bugs.openjdk.org/browse/JDK-8351434 > The slice operation is a vector rearrange wi... Thanks for the pointers, @merykitty. First of all, all aforementioned PRs/RFEs focus on new functionality. Any experiments migrating existing use cases (in particular, final graph reshaping and post-loop opts GVN ones)? I see one reference to a PR dependent on proposed logic, so I'll comment on it ([PR #22886](https://github.com/openjdk/jdk/pull/22886)): * It looks strange to see such transformations happening in x86-specific code. Are other platforms expected to reimplement it one by one? (I'd expect to see expansion logic in shared code guarded by `Matcher::match_rule_supported_vector()`. And `VectorCastNode` looks like the best place for it.) * How much does it benefit from a full-blown GVN? For example, there's already some basic redundancy elimination happening during final graph reshaping. Will it be enough here? Overall, I'm still not convinced that the proposed patch (as it is shaped now) is the right way to go. What I'm looking for is more experimental data on the usage patterns where lowering takes place (new functionality is fine, but I'm primarily interested in migrating existing use cases). So far, I see 2 types of scenarios either benefitting from delayed GVN transformations (post-loop opts GVN transformations, macro node lowering, GC barriers expansion) or requiring ad-hoc plaftorm-specific IR tweaks to simplify matching (happening during final graph reshaping). But It's still an open question to me what is the best way to cover ad-hoc platform-specific transformations on Ideal graph you seem to care about the most. >From maintenance perspective, it would help a lot to be able to easily share code across multiple ports while keeping ad-hoc platform-specific transformations close to the place where their results are consumed (in AD files). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2752319954 From sviswanathan at openjdk.org Tue Mar 25 20:23:25 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 25 Mar 2025 20:23:25 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v2] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 08:31:09 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 13758: >> >>> 13756: attributes.set_is_evex_instruction(); >>> 13757: attributes.set_embedded_opmask_register_specifier(mask); >>> 13758: attributes.reset_is_clear_context(); >> >> Why do we do reset_is_clear_context here? We want kdst bits to be set/reset and no merge context. > > Actually, its not relevant in this case. EVEX.Z bit is used to select b/w merging and zeroing semantics w.r.t to vector destination. for opmask destination we always set the [bits corresponding to masked out lanes to zero](https://www.felixcloutier.com/x86/vcmpph#:~:text=CMP_OPERATOR%20tsrc2%0A%20%20%20%20ELSE-,DEST.bit%5Bj%5D%20%3A%3D%200,-DEST%5BMAXKL%2D1) I went through the manual and it does look like EVEX.Z bit should be set to 0 for instructions that set opmask as destination. So we do need to have the reset_is_clear_context() in this instruction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2012876635 From dlong at openjdk.org Tue Mar 25 20:28:06 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Mar 2025 20:28:06 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: <7we6VTr5X1ivoXJWK2kJohRpeXFivjUG9NOK1Y4GcRg=.2bea9162-3786-49ac-911f-e983a6a9c6bf@github.com> On Mon, 24 Mar 2025 13:18:25 GMT, Afshin Zafari wrote: > The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift and is reported as > `runtime error: left shift of negative value -25 at relocInfo.cpp:...` > > Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. > > Tests: > linux-x64-debug tier1 on a UBSAN enabled build. I tried it, and signed overflow with multiply is caught by -fsanitize=signed-integer-overflow, so we need to use unsigned to avoid UBSAN errors. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2752434179 From vlivanov at openjdk.org Tue Mar 25 23:19:17 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 25 Mar 2025 23:19:17 GMT Subject: RFR: 8352426: RelocIterator should correctly handle nullptr address of relocation data In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:04:04 GMT, Boris Ulasevich wrote: > This is a follow-up to the recent #24102 patch. It addresses an issue where RelocIterator may receive a nullptr as the relocation table address. This change can also serve as an independent fix for JDK-8352112. > > RelocIterator::initialize() and RelocIterator::next() perform decrement/increment operations on an internal relocaction pointer. > If nm->relocation_begin() returns nullptr, this results in undefined behavior, as pointer arithmetic on nullptr is prohibited by the C++ Standard. > > Instead of introducing a null-check (which would add overhead in RelocIterator::next(), a performance-sensitive path), we initialize _current with a dummy static variable. This pointer is never dereferenced, so its actual value is not important - it just serves to avoid undefined behavior. > > RelocIterator::RelocIterator constructor can initialize _current pointer as well. However, in that place we have an assert to ensure that nullptr value is not allowed, and it seems we do not need to apply dummy value there. > > Testing: > > The fix has been verified against the failure in JDK-8352112. The issue no longer reproduces with this patch, regardless of whether the original fix from #24102 is applied. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24203#pullrequestreview-2715404732 From vlivanov at openjdk.org Tue Mar 25 23:35:18 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 25 Mar 2025 23:35:18 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: Message-ID: <_QvyAWuOP7uiUcWy0jEb6tN0CNIQOAQqZh8-7BxIWy4=.5a406f3e-ad92-492d-84c9-a3ef7e7941b2@github.com> On Mon, 17 Mar 2025 15:28:12 GMT, Rohit Arul Raj wrote: >> In JDK-8309130, Array sort was optimized using AVX512 SIMD instructions for x86_64. Currently, this optimization has been disabled for AMD Zen 4 [JDK-8317763] due to bad performance of compressstoreu. >> Ref: https://www.reddit.com/r/java/comments/171t5sj/heads_up_openjdk_implementation_of_avx512_based/. >> >> This patch enables Zen 4 to pick optimized AVX2 version of SIMD sort and Zen 5 picks the AVX512 version. >> >> JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. >> >> Attaching ArraySort performance data for Zen4 & Zen5. >> [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) >> [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) > > Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: > > create a separate method to check for cpu's supporting avx512 version of simd sort Overall, looks good. src/hotspot/cpu/x86/vm_version_x86.hpp line 778: > 776: static bool supports_avx512_simd_sort() { > 777: // Disable AVX512 version of SIMD Sort on AMD Zen4 Processors > 778: return ((is_intel() || (is_amd() && (cpu_family() > CPU_FAMILY_AMD_19H))) && supports_avx512dq()); } It's quite hard to parse. The following looks clearer to me: if (supports_avx512dq()) { // Disable AVX512 version of SIMD Sort on AMD Zen4 Processors. if (is_amd() && cpu_family() == CPU_FAMILY_AMD_19H) { return false; } return true; } return false; ------------- PR Review: https://git.openjdk.org/jdk/pull/24053#pullrequestreview-2715414909 PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r2013066260 From duke at openjdk.org Wed Mar 26 00:54:25 2025 From: duke at openjdk.org (Johannes Graham) Date: Wed, 26 Mar 2025 00:54:25 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v27] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 15:54:55 GMT, Emanuel Peter wrote: >> Hi @eme64, do you have any more recommendations on this? > > @j3graham I think the VM changes look good, and the tests are almost there. So I launched some testing. Please ping me in a day for the results! Thank you @eme64, @merykitty, and @jaskarth for the feedback. I guess I?m now looking for one more Reviewer (with capital R) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2752891455 From swen at openjdk.org Wed Mar 26 02:00:13 2025 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 26 Mar 2025 02:00:13 GMT Subject: RFR: 8352316: More MergeStoreBench In-Reply-To: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: On Wed, 19 Mar 2025 03:28:59 GMT, Shaojin Wen wrote: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works I'm a developer of fastjson2. According to third-party benchmarks from https://github.com/fabienrenaud/java-json-benchmark, our library demonstrates the best performance. I would like to contribute some of these optimization techniques to OpenJDK, ideally by having C2 (the JIT compiler) directly support them. Below is an example related to this PR. We have a JavaBean that needs to be serialized to a JSON string: * JavaBean class Bean { public int value; } * Target JSON Output {"value":123} * CodeGen-Generated JSONSerializer fastjson2 uses ASM to generate a serializer class like the following. The methods writeNameValue0, writeNameValue1, and writeNameValue2 are candidate implementations. Among them, writeNameValue2 is the fastest when the field name length is 8, as it leverages UNSAFE.putLong for direct memory operations: class BeanJSONSerializer { private static final String name = ""value":"; private static final byte[] nameBytes = name.getBytes(); private satic final long nameLong = UNSAFE.getLong(nameBytes, ARRAY_BYTE_BASE_OFFSET); int writeNameValue0(byte[] bytes, int off, int value) { name.getBytes(0, 8, bytes, off); off += 8; return writeInt32(bytes, off, value); } int writeNameValue1(byte[] bytes, int off, int value) { System.arraycopy(nameBytes, 0, bytes, off, 8); off += 8; return writeInt32(bytes, off, value); } int writeNameValue2(byte[] bytes, int off, int value) { UNSAFE.putLong(bytes, ARRAY_BYTE_BASE_OFFSET + off, nameLong); off += 8; return writeInt32(bytes, off, value); } } We propose that the C2 compiler could optimize cases where the field name length is 4 or 8 bytes by automatically using direct memory operations similar to writeNameValue2. This would eliminate the need for manual unsafe operations in user code and improve serialization performance for common patterns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2753046348 From fyang at openjdk.org Wed Mar 26 02:00:07 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 26 Mar 2025 02:00:07 GMT Subject: RFR: 8352897: RISC-V: Change default value for UseConservativeFence In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 15:49:40 GMT, Robbin Ehn wrote: > Hi, please consider. > > gcc have stopped emitting io-bits for fences since 13. > And we need to use newer gcc due to other compiler bugs. > Therefore there is no point in letting JIT emit io-bits when the runtime don't have them. > > Thanks, Robbin Looks good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24233#pullrequestreview-2715706809 From qamai at openjdk.org Wed Mar 26 03:33:13 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 26 Mar 2025 03:33:13 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 19:31:20 GMT, Vladimir Ivanov wrote: > First of all, all aforementioned PRs/RFEs focus on new functionality. I don't know where you get this impression from. Most of the aforementioned PRs/RFEs are existing transformations, we just do it elsewhere. #22922 is currently done in idealization in a clumsy manner, it would be better to do it with the consideration of the underlying hardware, since it is the entire purpose of that transformation. > Some examples that I have given regarding vector insertion and vector extraction. This is done during code emission, which does not benefit from common expression elimination https://bugs.openjdk.org/browse/JDK-8345812 is currently done during parsing, it would be easier for the autovectorizer to use the node if it wants to if we do the transformation later. For existing use cases, you can find a lot of them scattering around: - Transformation of `MulNode` to `LShiftNode` that we have covered above. - `CMoveNode` tries to push 0 to the right because on x86, making a constant 0 kills the flag register, and `cmov` is a 2-address instruction that kills the first input. - `final_graph_reshaping_impl` tries to swap the inputs of some nodes because on x86, these are 2-address instructions that kill the first input. - There are some transformations in `final_graph_reshaping_main_switch` that are guarded with `Matcher`, if we move them to lowering we can skip these queries. - A lot of use cases you can find in code emission (a.k.a. x86.ad). It makes sense, because everything you can do during lowering can be done during code emission, just in a less efficient manner. At this point you also have the most knowledge and can transform the instructions arbitrarily without worrying about other architectures. Some notable examples: min/max are expanded into compare and cmov, reverse short is implemented by reserse int and a right shift, `Conv2B` is just compare with 0 and setcc, a lot of vector nodes, etc. > I see one reference to a PR dependent on proposed logic, so I'll comment on it (https://github.com/openjdk/jdk/pull/22886): For the first question, the reason I believe is that it is not always possible to extract and insert elements into a vector efficiently. On x86 it takes maximum 2 instructions to extract a vector element and 3 instructions to insert an element into a vector. For the second question, without lowering the cost is miserable, if you are unpacking and packing a vector of 4 longs: // unpacking movq rax, xmm0 vpextrq rcx, xmm0, 1 vextracti128 xmm1, ymm0, 1 movq rdx, xmm1 vextracti128 xmm1, ymm0, 1 vpextrq rbx, xmm1, 1 // packing vpxor xmm0, xmm0, xmm0 vextracti128 xmm1, ymm0, 0 vpinsrq xmm1, xmm1, rax, 0 vinserti128 ymm0, ymm0, xmm1, 0 vextracti128 xmm1, ymm0, 0 vpinsrq xmm1, xmm1, rcx, 1 vinserti128 ymm0, ymm0, xmm1, 0 vextracti128 xmm1, ymm0, 1 vpinsrq xmm1, xmm1, rdx, 0 vinserti128 ymm0, ymm0, xmm1, 1 vextracti128 xmm1, ymm0, 1 vpinsrq xmm1, xmm1, rbx, 1 vinserti128 ymm0, ymm0, xmm1, 1 while if we have lowering, those can be simplified into: // unpacking movq rax, xmm0 vpextrq rcx, xmm0, 1 vextracti128 xmm1, ymm0, 1 movq rdx, xmm1 vpextrq rbx, xmm1, 1 // packing vmovq xmm0, rax vinsrq xmm0, xmm0, rcx, 1 vmovq xmm1, rdx vinsrq xmm1, xmm1, rbx, 1 vinserti128 ymm0, ymm0, xmm1, 1 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2753151256 From dfenacci at openjdk.org Wed Mar 26 07:10:15 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 26 Mar 2025 07:10:15 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 00:01:01 GMT, Vladimir Ivanov wrote: >> # Issue >> >> The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. >> >> # Cause >> >> The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. >> The graph that leads to the issue looks like this: >> ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) >> The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: >> ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) >> The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. >> The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. >> >> This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. >> >> # Solution >> >> Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. >> >> # Testing >> >> Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) >> >> Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) > > The root cause of the bug is that type information obtained during inlining is not propagated until IGVN kicks in. Vector API is special here, because (1) it heavily relies on exact type information to perform intrinsification; and (2) vector intrinsics are processed during post-parse inlining. IMO the current fix (do cleanup when VectorBox is returned) is good enough as a stop-the-gap fix for Vector API issue (missed intrinsification opportunity). > > As an alternative fix, limited IGVN pass over `CastPP`/`CheckCastPP` users of result value may be enough to avoid full-blown cleanup. > > I suspect some other intrinsics may be susceptible to a similar issue, but in such case it would be more like a corner case (few intrinsics fail in rare conditions). A proper fix would be to re-examine failed intrinsics call site during IGVN and repeat intrinsifcation attempt when their inputs improve (akin to what is done in `CallStaticJavaNode::Ideal()`/`CallDynamicJavaNode::Ideal()`). @iwanowww, @TobiHartmann thanks so much for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21682#issuecomment-2753424128 From dfenacci at openjdk.org Wed Mar 26 07:10:15 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 26 Mar 2025 07:10:15 GMT Subject: Integrated: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:31:37 GMT, Damon Fenacci wrote: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. > > # Testing > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) > > Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) This pull request has now been integrated. Changeset: 2e4d7d18 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/2e4d7d1846d846fd98201b9b3abeb7b91239a40d Stats: 99 lines in 7 files changed: 45 ins; 3 del; 51 mod 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure Co-authored-by: Vladimir Ivanov Reviewed-by: thartmann, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/21682 From mchevalier at openjdk.org Wed Mar 26 08:33:58 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 26 Mar 2025 08:33:58 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v2] In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Use builtin_throw - Merge branch 'master' into fix/Deoptimization-and-re-compilation-cycle-with-C2-compiled-code - More exhaustive bench - Limit inlining of math Exact operations in case of too many deopts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23916/files - new: https://git.openjdk.org/jdk/pull/23916/files/2317919f..9372228d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=00-01 Stats: 66384 lines in 1241 files changed: 32808 ins; 21395 del; 12181 mod Patch: https://git.openjdk.org/jdk/pull/23916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23916/head:pull/23916 PR: https://git.openjdk.org/jdk/pull/23916 From mbaesken at openjdk.org Wed Mar 26 08:37:14 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 26 Mar 2025 08:37:14 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 18:01:57 GMT, Dean Long wrote: > Yes, we should fix frequency calculation in a separate follow up. I was trying to better understand what is going on, to see what value to clamp the result to. If this can only happens for infinite loops, then it seems like clamping to_pct to 100 is the right answer. And as Tom said, if we get it wrong, we just get a less good layout. Okay so not INT_MAX but 100 ; should I do it from both `from_pct ` and `to_pct ` ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23962#issuecomment-2753594389 From mbaesken at openjdk.org Wed Mar 26 08:37:15 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 26 Mar 2025 08:37:15 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 08:33:56 GMT, Matthias Baesken wrote: > Yes, we should fix frequency calculation in a separate follow up Do you want me to open a JBS issue for this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23962#issuecomment-2753595803 From mchevalier at openjdk.org Wed Mar 26 08:39:09 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 26 Mar 2025 08:39:09 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v2] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> <_3l8ylsbgvsqQE1Ihp0BUAx2o_VzcS6R2jWBSKW9u1E=.0dcb6086-ff6f-4c9a-b990-6665a476a3dc@github.com> Message-ID: On Fri, 21 Mar 2025 22:34:43 GMT, Vladimir Ivanov wrote: >> I think adapting and re-using `builtin_throw` like you described is reasonable but I let @iwanowww confirm :slightly_smiling_face: > > Yes, that's basically what I had in mind. > > Currently, the focus of the intrinsic is on well-behaved case (overflows are **very** rare). `builtin_throw()` covers more ground and optimize for scenarios when exceptions are thrown. But it depends on `ciMethod::can_omit_stack_trace()` where `-XX:-OmitStackTraceInFastThrow` mode will suffer from the original problem (continuous deoptimizations), plus a round of recompilations before giving up. > > I suggest to improve and reuse `builtin_throw` here and add additional checks in the intrinsic to guard against problematic scenario with continuous deoptimizations. IMO it improves performance model for a wide range of use cases while addressing pathological scenarios. So, I have done something like that (getting the exception object to throw from parameter, and factor out the logic whether builtin_throw is possible, so we can bailout of intrinsics instead of cycling again). Test seem to pass in the various cases I wrote. As for benchmark, it's quite a change. I post only the new part, the rest is pretty much the same. C2_no_builtin_throw does what the original C2 was (no builtin throw, just bailing out of intrinsics to cut our losses), and new C2 is with builtin_throw. tldr: builtin_throw makes the overflow case of the same order as the in-bound cases (1-4ms) instead of being about 100 times bigger (600-700ms with C1, C2 without intrinsics, C2 with bailing out). MathExact.C2.loopAddIInBounds 1000000 avgt 3 1.657 ? 11.994 ms/op MathExact.C2.loopAddIOverflow 1000000 avgt 3 1.313 ? 4.188 ms/op MathExact.C2.loopAddLInBounds 1000000 avgt 3 0.980 ? 0.396 ms/op MathExact.C2.loopAddLOverflow 1000000 avgt 3 2.474 ? 3.473 ms/op MathExact.C2.loopDecrementIInBounds 1000000 avgt 3 3.733 ? 13.709 ms/op MathExact.C2.loopDecrementIOverflow 1000000 avgt 3 2.792 ? 23.724 ms/op MathExact.C2.loopDecrementLInBounds 1000000 avgt 3 2.761 ? 24.744 ms/op MathExact.C2.loopDecrementLOverflow 1000000 avgt 3 2.730 ? 23.065 ms/op MathExact.C2.loopIncrementIInBounds 1000000 avgt 3 3.134 ? 20.980 ms/op MathExact.C2.loopIncrementIOverflow 1000000 avgt 3 3.271 ? 8.876 ms/op MathExact.C2.loopIncrementLInBounds 1000000 avgt 3 2.756 ? 22.912 ms/op MathExact.C2.loopIncrementLOverflow 1000000 avgt 3 4.549 ? 9.543 ms/op MathExact.C2.loopMultiplyIInBounds 1000000 avgt 3 1.268 ? 0.574 ms/op MathExact.C2.loopMultiplyIOverflow 1000000 avgt 3 1.572 ? 11.171 ms/op MathExact.C2.loopMultiplyLInBounds 1000000 avgt 3 1.021 ? 1.054 ms/op MathExact.C2.loopMultiplyLOverflow 1000000 avgt 3 3.167 ? 20.666 ms/op MathExact.C2.loopNegateIInBounds 1000000 avgt 3 3.575 ? 29.997 ms/op MathExact.C2.loopNegateIOverflow 1000000 avgt 3 4.222 ? 9.041 ms/op MathExact.C2.loopNegateLInBounds 1000000 avgt 3 4.452 ? 6.680 ms/op MathExact.C2.loopNegateLOverflow 1000000 avgt 3 4.739 ? 34.662 ms/op MathExact.C2.loopSubtractIInBounds 1000000 avgt 3 1.087 ? 0.539 ms/op MathExact.C2.loopSubtractIOverflow 1000000 avgt 3 3.027 ? 9.709 ms/op MathExact.C2.loopSubtractLInBounds 1000000 avgt 3 1.197 ? 5.763 ms/op MathExact.C2.loopSubtractLOverflow 1000000 avgt 3 1.765 ? 10.037 ms/op MathExact.C2_no_builtin_throw.loopAddIInBounds 1000000 avgt 3 2.310 ? 2.990 ms/op MathExact.C2_no_builtin_throw.loopAddIOverflow 1000000 avgt 3 594.036 ? 500.000 ms/op MathExact.C2_no_builtin_throw.loopAddLInBounds 1000000 avgt 3 1.577 ? 14.053 ms/op MathExact.C2_no_builtin_throw.loopAddLOverflow 1000000 avgt 3 631.345 ? 75.836 ms/op MathExact.C2_no_builtin_throw.loopDecrementIInBounds 1000000 avgt 3 2.090 ? 0.937 ms/op MathExact.C2_no_builtin_throw.loopDecrementIOverflow 1000000 avgt 3 618.080 ? 38.047 ms/op MathExact.C2_no_builtin_throw.loopDecrementLInBounds 1000000 avgt 3 4.164 ? 6.184 ms/op MathExact.C2_no_builtin_throw.loopDecrementLOverflow 1000000 avgt 3 596.031 ? 584.159 ms/op MathExact.C2_no_builtin_throw.loopIncrementIInBounds 1000000 avgt 3 2.383 ? 11.729 ms/op MathExact.C2_no_builtin_throw.loopIncrementIOverflow 1000000 avgt 3 626.425 ? 134.612 ms/op MathExact.C2_no_builtin_throw.loopIncrementLInBounds 1000000 avgt 3 2.345 ? 13.927 ms/op MathExact.C2_no_builtin_throw.loopIncrementLOverflow 1000000 avgt 3 630.535 ? 99.348 ms/op MathExact.C2_no_builtin_throw.loopMultiplyIInBounds 1000000 avgt 3 1.419 ? 4.289 ms/op MathExact.C2_no_builtin_throw.loopMultiplyIOverflow 1000000 avgt 3 587.796 ? 52.215 ms/op MathExact.C2_no_builtin_throw.loopMultiplyLInBounds 1000000 avgt 3 0.934 ? 0.272 ms/op MathExact.C2_no_builtin_throw.loopMultiplyLOverflow 1000000 avgt 3 589.736 ? 347.848 ms/op MathExact.C2_no_builtin_throw.loopNegateIInBounds 1000000 avgt 3 2.236 ? 5.749 ms/op MathExact.C2_no_builtin_throw.loopNegateIOverflow 1000000 avgt 3 618.711 ? 725.158 ms/op MathExact.C2_no_builtin_throw.loopNegateLInBounds 1000000 avgt 3 2.605 ? 17.373 ms/op MathExact.C2_no_builtin_throw.loopNegateLOverflow 1000000 avgt 3 627.055 ? 184.767 ms/op MathExact.C2_no_builtin_throw.loopSubtractIInBounds 1000000 avgt 3 1.006 ? 0.584 ms/op MathExact.C2_no_builtin_throw.loopSubtractIOverflow 1000000 avgt 3 588.062 ? 403.116 ms/op MathExact.C2_no_builtin_throw.loopSubtractLInBounds 1000000 avgt 3 0.978 ? 0.193 ms/op MathExact.C2_no_builtin_throw.loopSubtractLOverflow 1000000 avgt 3 611.004 ? 456.779 ms/op ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2013625437 From chagedorn at openjdk.org Wed Mar 26 08:51:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Mar 2025 08:51:07 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: <2OJ3-DbdBBiHjZKVDGNXb-IFLqXi0jI4Ezz_zs6NsMA=.a904b56a-a3c4-4f06-a708-70223beccb60@github.com> Message-ID: On Thu, 20 Mar 2025 08:08:25 GMT, Roberto Casta?eda Lozano wrote: >>> Capturing i >= 0 in the loop Phi or array address CastII or ConvI2L then enables better use of address modes on x86. >> >> That's very promising. >> >>> Except, narrowing the type of the Phi or CastII expose sC2 to the exact bug this PR tries to fix: what if the loop becomes unreachable but C2 can't fold it away and the Phi or CastII end up having an out of range input? >> >> But also a problem, indeed. I just think that going into the future, we should still make a reasonable effort to try and let the control path die sanely without needing this patch. It should only serve as a last resort to avoid breaking the graph. While I think it's the safest solution, my concern is that we will not find inefficiencies anymore with this patch. For example, if someone breaks Assertion Predicates, how can we detect this when the graph will always be sane? It's especially tricky now that I'm still adding Assertion Predicate patches and things might break during development and it goes unnoticed. But maybe I just need to turn this patch off locally. >> >> A develop flag to turn this patch off could also help but then we have the problem that someone uses the flag and reports an assertion failure that is actually not a real bug because it's one of these kinds of failures we cannot fix otherwise. But I assume it's quite rare that this will happen. >> >>> For the test case that I added for this bug, the issue is that some CastII transformations widen the types of some nodes. I suppose the way to fix this would be to restrict those transformations so widening doesn't happen in some cases. It's going to be tricky (because widening happens so mostly identical CastII nodes can be commoned to improve code quality) and fragile (if to preserve performance, we choose to only restrict those transformations to few targeted cases). >> >> It sounds hard to find a control path removal fix for these kind of issues. Also for the type being zero on the div by zero failing path which lets some type nodes die and control is not because we don't have an "everything but zero" type. >> >> > For 8275202, what I tried doing is that when the new pass proves a condition constant, rather than constant fold the condition, it mark the test as always failing/succeeding (so (If (Bool ...))is transformed into(If (Opaque4 (Booland theOpaque4captures the final result of theBool. Then the Opaque4` constant folds later. I found several issues with this: >> >> Sounds interesting but as you've stated creates new... > >> * We should make sure that compilation speed is not significantly affected by doing this search on all dying `Type` nodes (maybe @robcasloz can give you some pointers here - he did some compilation time measurements before). > > I measured C2 speed for this patch on top of jdk-25+14 vs jdk-25+14 using DaCapo23 on two different platforms and do not see any significant effect, see detailed results [here](https://github.com/user-attachments/files/19361310/C2-speed-jdk-25%2B14-vs-JDK-8349479.pdf). We've also discussed this in our team meeting and we all agree on the need and usefulness of this patch. We should definitely move forward with it. Maybe it even has a measurable positive impact on footprint due to eagerly cutting off dead paths in IGVN (mentioned by @robcasloz offline). Summarizing my thoughts about what's left to do or decide upon: - Flag to disable this patch to detect inefficiencies like messing up Assertion Predicates. This should only be possibly done in stress jobs and in the future if 8275202 is disabled as well. The assumption is that we rarely crash when disabling this patch, so the overhead of looking into such issues is minimal (we could still remove the flag if we get too many false positive reports). I could, for example, then run my Assertion Predicate tests with that additional flag to still trigger the issues. - Run some extended CI testing (we can do that). - Maybe for tracking purposes file an RFE stating the idea to only run this patch on those cases where we cannot do something else to fold control. As you've already stated earlier, this is hard and might not even be possible. But maybe someone comes up with something in the future. What do you think? I will also have a closer look at the actual code later this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2753622436 From duke at openjdk.org Wed Mar 26 08:59:56 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 26 Mar 2025 08:59:56 GMT Subject: RFR: 8350471: Unhandled compilation bailout in GraphKit::builtin_throw Message-ID: # Issue Summary When creating a builtin exception node, a stress test decided to bail out as if the allocation of the builtin exception objects had failed. Since these are preallocated at VM creation, the test failure is a false positive. # Change Rationale `GraphKit::builtin_throw()` features a bailout check after getting an appropriate exception object. However, up to that point, the execution in `builtin_throw()` cannot fail. In particular, there can be no failure to allocate the exception because these are all preallocated during `Threads::create_vm()` startup in `universe_post_init()` and `Threads:initialize_java_lang_classes()`. Further, none of the three callers handles a possible bailout in `builtin_throw()`. Hence, this PR removes the bailout check responsible for the test failure # Testing - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078715650) - tier1 through tier3 and Oracle internal testing ------------- Commit messages: - graphKit: remove unneeded failure check in builtin_throw() Changes: https://git.openjdk.org/jdk/pull/24243/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24243&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350471 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24243.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24243/head:pull/24243 PR: https://git.openjdk.org/jdk/pull/24243 From qxing at openjdk.org Wed Mar 26 09:02:18 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 26 Mar 2025 09:02:18 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 09:00:38 GMT, Qizheng Xing wrote: >> In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. >> >> loop-safepoint >> >> This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). >> >> >> Benchmark Mode Cnt Score Error Units >> LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline >> LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch >> >> >> Testing: tier1-2 on x86_64 and aarch64. > > Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into enhance-loop-safepoint-elim > - Add IR test and microbench. > - Make `PhaseIdealLoop` eliminate more redundant safepoints in loops. The second question: > If we now removed safepoints in places where we would actually have needed them: how would we find out? I suppose we would get longer time to safepoint - higher latency in some cases. How would we catch this with our tests? I tried running tier1 tests with `JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+SafepointTimeout -XX:+AbortVMOnSafepointTimeout -XX:SafepointTimeoutDelay=1000`, and there were no failures. Running with `-XX:SafepointTimeoutDelay=500` caused 1 random JDK test case to fail. But then I tried to build a JDK without this patch, and it still had the random failure with this option. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2753652662 From thartmann at openjdk.org Wed Mar 26 09:06:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 26 Mar 2025 09:06:10 GMT Subject: RFR: 8350471: Unhandled compilation bailout in GraphKit::builtin_throw In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 08:55:10 GMT, Manuel H?ssig wrote: > # Issue Summary > > When creating a builtin exception node, a stress test decided to bail out as if the allocation of the builtin exception objects had failed. Since these are preallocated at VM creation, the test failure is a false positive. > > # Change Rationale > > `GraphKit::builtin_throw()` features a bailout check after getting an appropriate exception object. However, up to that point, the execution in `builtin_throw()` cannot fail. In particular, there can be no failure to allocate the exception because these are all preallocated during `Threads::create_vm()` startup in `universe_post_init()` and `Threads:initialize_java_lang_classes()`. Further, none of the three callers handles a possible bailout in `builtin_throw()`. Hence, this PR removes the bailout check responsible for the test failure > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078715650) > - tier1 through tier3 and Oracle internal testing Looks good to me, thanks for investigating! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24243#pullrequestreview-2716358391 From duke at openjdk.org Wed Mar 26 09:08:22 2025 From: duke at openjdk.org (duke) Date: Wed, 26 Mar 2025 09:08:22 GMT Subject: RFR: 8352490: Fatal error message for unhandled bytecode needs more detail [v2] In-Reply-To: References: Message-ID: <8TU7qUUZusfyH5V497hNA7qFHxMCxuuRrl7thKjKaao=.25597d07-7c65-42ba-bd10-326f638d16d6@github.com> On Mon, 24 Mar 2025 14:19:37 GMT, Saranya Natarajan wrote: >> Description: Improve the error message for unhandled bytecode in `line#129` of function `Bytecodes::Code ciBytecodeStream::next_wide_or_table` in file ciStream.cpp >> >> Solution: The error message is improved to print OPCODE and bytecode index (BCI) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > adding information for printing current method @sarannat Your change (at version cd029a9a69bea7cdd6d22758aa56ec0ef602501a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24187#issuecomment-2753669823 From chagedorn at openjdk.org Wed Mar 26 09:27:39 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Mar 2025 09:27:39 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops Message-ID: _Note: The actual fix is only ~80 changed lines - everything else is about tests._ After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). ### Maintain Assertion Predicates when Splitting a Loop When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). #### Current State Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | | ------------------------ | --------------------------------------- | --------------------------------------- | | Create Main Loop | ? | ? | | Create Post Loop | ? | ? | | Loop Unswitching | ? | _not required, same init, stride and, limit_ | | Loop Unrolling | ? | ? | | Range Check Elimination | ? | ? | | Loop Peeling | ? | ? | | Splitting Main Loop | ? | ? | Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. #### Fixing Unsupported Cases This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling. - Remove the restriction that we only clone Template Assertion Predicate when Parse Predicates are available (this prevents, for example, establishing Assertion Predicates when splitting a main loop further). - Killing old Template Assertion Predicates eagerly after cloning from them is finished (this was missing and is more efficient than waiting for the next round of loop opts to mark them useless). When I was doing some refactoring work in earlier PRs, I did not want to include semantic changes and thus kept these actually unnecessary restrictions up. Now that the preparation work is complete, I can safely remove the blocking pieces to fix the remaining cases with not changing that many lines of code. ### Testing I've collected a lot of tests from various JBS reports and fuzzer reports and included them all as test cases with `testJBSNumber()`. I also created a lot of new tests to cover the different loop splitting cases. More advanced tests also cover the chaining of different loop splitting optimizations together. During the development I added more tests to cover some intermediate issues. Note that I always run the tests with `-XX:+AbortVMOnCompilationFailure` to crash on some compilation bailouts due to a broken graph (there should not be any compilation bailouts). ### Work Left to Do There are still some tasks left to tackle after this fix goes in: - Add some verification when eliminating useless Template Assertion Predicates ([JDK-8352418](https://bugs.openjdk.org/browse/JDK-8352418)). - Do a general pass over all the new predicate code added over the course of many PRs. I've added a lot of intermediate code that became obsolete again. There are probably some opportunities left to clean up the code further now. - Replace Template Assertion Predicate `If` nodes with a new dedicated `TemplateAssertionPredicate` node. We currently create the following nodes during Loop Predication for a hoisted range check: ![Screenshot from 2025-03-25 16-15-01](https://github.com/user-attachments/assets/4e724bf5-49bc-4b08-be6e-b916cb3680ac) What we actually need to keep around are both `Bool` nodes in order to create Initialized Assertion Predicates with them. We only used `If` nodes in the past due to easier matching back there with UCTs on the failing path. We got rid of the UCT and replaced it with a `Halt`. But having an `If` node in the first place is not really mandatory since we are always removing the Template Assertion Predicates after loop opts are over which basically means the `If` node a `nop`. What we can actually do is having a single dedicated `TemplateAssertionPredicate` CFG `nop` node instead that is folded after loop opts are over: ![image](https://github.com/user-attachments/assets/31733251-c617-479c-9c81-ab56a8758ee3) This also allows us to simplify code where we special case `OpaqueTemplateAssertionPredicate` bools for `If` nodes. - Enable Loop Peeling stress option ([JDK-8286805](https://bugs.openjdk.org/browse/JDK-8286805)). This could not have been done so far due to hitting the remaining Assertion Predicate issues too often. - Add IR tests to show the usefulness of Assertion Predicates that they can fold dead loops away. This can also be beneficial when https://github.com/openjdk/jdk/pull/23468 is integrated and we break Assertion Predicates in some way not caught by the tests added with this PR or existing tests that only trigger when the graph is broken. Thank again to @rwestrel, @eme64, and also @vnkozlov, for reviewing and discussing a lot of the work around Assertion Predicates! Thanks, Christian ------------- Commit messages: - 8350577: Fix missing Assertion Predicates when splitting loops Changes: https://git.openjdk.org/jdk/pull/24246/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24246&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350577 Stats: 1799 lines in 5 files changed: 1700 ins; 34 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/24246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24246/head:pull/24246 PR: https://git.openjdk.org/jdk/pull/24246 From shade at openjdk.org Wed Mar 26 09:36:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 09:36:30 GMT Subject: RFR: 8352948: Remove leftover runtime_x86_32.cpp after 32-bit x86 removal Message-ID: As I merged [JDK-8345169](https://bugs.openjdk.org/browse/JDK-8345169), I noticed one more file was left over: $ find | grep x86_32 ./src/hotspot/cpu/x86/runtime_x86_32.cpp My bad, this PR removes that leftover. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/24249/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24249&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352948 Stats: 332 lines in 1 file changed: 0 ins; 332 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24249.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24249/head:pull/24249 PR: https://git.openjdk.org/jdk/pull/24249 From mchevalier at openjdk.org Wed Mar 26 09:41:26 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 26 Mar 2025 09:41:26 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself Message-ID: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! I think @chhagedorn's eye would be the most relevant. Thanks, Marc ------------- Commit messages: - Make TestCompileCommandFileWriter run itself Changes: https://git.openjdk.org/jdk/pull/24240/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24240&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352617 Stats: 10 lines in 1 file changed: 2 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24240.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24240/head:pull/24240 PR: https://git.openjdk.org/jdk/pull/24240 From chagedorn at openjdk.org Wed Mar 26 09:41:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Mar 2025 09:41:26 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself In-Reply-To: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> Message-ID: <1vY1-VSV9Tj4rrY47DdpWBmnYEiuvSTfX2I5mhsODnw=.2faaa3a9-a6df-4ba4-ab79-813b4551e417@github.com> On Wed, 26 Mar 2025 07:36:37 GMT, Marc Chevalier wrote: > Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! > > I think @chhagedorn's eye would be the most relevant. > > Thanks, > Marc Good catch! I'm not sure why I removed this one line. It surely must have been exactly what you added now. Maybe I was doing some last experiments and somehow messed up to push it properly and then we missed it because it was actually not even run. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24240#pullrequestreview-2716449643 From mchevalier at openjdk.org Wed Mar 26 09:41:27 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 26 Mar 2025 09:41:27 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself In-Reply-To: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> Message-ID: <4kcnJ_hXXskugWzn0x2cQNwH0cXpiftOQMTJlqdKKpI=.8aae37b8-2895-4cdc-b70e-c2f7cebed0b5@github.com> On Wed, 26 Mar 2025 07:36:37 GMT, Marc Chevalier wrote: > Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! > > I think @chhagedorn's eye would be the most relevant. > > Thanks, > Marc test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/flag/TestCompileCommandFileWriter.java line 85: > 83: private void check(Class testClass, boolean findIdeal, boolean findOpto, CompilePhase... compilePhases) throws IOException { > 84: var compilerDirectivesFlagBuilder = new CompilerDirectivesFlagBuilder(testClass); > 85: compilerDirectivesFlagBuilder.build(); I was tempted to write new CompilerDirectivesFlagBuilder(testClass).build(); since we don't use `compilerDirectivesFlagBuilder` after. But I felt like the style might not be liked. Opinions on that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24240#discussion_r2013646843 From chagedorn at openjdk.org Wed Mar 26 09:41:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Mar 2025 09:41:27 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself In-Reply-To: <4kcnJ_hXXskugWzn0x2cQNwH0cXpiftOQMTJlqdKKpI=.8aae37b8-2895-4cdc-b70e-c2f7cebed0b5@github.com> References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> <4kcnJ_hXXskugWzn0x2cQNwH0cXpiftOQMTJlqdKKpI=.8aae37b8-2895-4cdc-b70e-c2f7cebed0b5@github.com> Message-ID: On Wed, 26 Mar 2025 08:50:32 GMT, Marc Chevalier wrote: >> Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! >> >> I think @chhagedorn's eye would be the most relevant. >> >> Thanks, >> Marc > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/flag/TestCompileCommandFileWriter.java line 85: > >> 83: private void check(Class testClass, boolean findIdeal, boolean findOpto, CompilePhase... compilePhases) throws IOException { >> 84: var compilerDirectivesFlagBuilder = new CompilerDirectivesFlagBuilder(testClass); >> 85: compilerDirectivesFlagBuilder.build(); > > I was tempted to write > > new CompilerDirectivesFlagBuilder(testClass).build(); > > since we don't use `compilerDirectivesFlagBuilder` after. But I felt like the style might not be liked. Opinions on that? I guess it's fine to go with `new CompilerDirectivesFlagBuilder(testClass).build()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24240#discussion_r2013723724 From duke at openjdk.org Wed Mar 26 09:42:27 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 26 Mar 2025 09:42:27 GMT Subject: Integrated: 8352490: Fatal error message for unhandled bytecode needs more detail In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 10:19:50 GMT, Saranya Natarajan wrote: > Description: Improve the error message for unhandled bytecode in `line#129` of function `Bytecodes::Code ciBytecodeStream::next_wide_or_table` in file ciStream.cpp > > Solution: The error message is improved to print OPCODE and bytecode index (BCI) This pull request has now been integrated. Changeset: 059f190f Author: Saranya Natarajan Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/059f190f4b0c7836b89ca2070400529e8d33790b Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8352490: Fatal error message for unhandled bytecode needs more detail Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24187 From chagedorn at openjdk.org Wed Mar 26 09:48:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Mar 2025 09:48:07 GMT Subject: RFR: 8350471: Unhandled compilation bailout in GraphKit::builtin_throw In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 08:55:10 GMT, Manuel H?ssig wrote: > # Issue Summary > > When creating a builtin exception node, a stress test decided to bail out as if the allocation of the builtin exception objects had failed. Since these are preallocated at VM creation, the test failure is a false positive. > > # Change Rationale > > `GraphKit::builtin_throw()` features a bailout check after getting an appropriate exception object. However, up to that point, the execution in `builtin_throw()` cannot fail. In particular, there can be no failure to allocate the exception because these are all preallocated during `Threads::create_vm()` startup in `universe_post_init()` and `Threads:initialize_java_lang_classes()`. Further, none of the three callers handles a possible bailout in `builtin_throw()`. Hence, this PR removes the bailout check responsible for the test failure > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078715650) > - tier1 through tier3 and Oracle internal testing Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24243#pullrequestreview-2716489236 From mchevalier at openjdk.org Wed Mar 26 10:14:19 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 26 Mar 2025 10:14:19 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces Message-ID: If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. Tested with tier1..3, hs-precheckin-comp and hs-comp-stress Thanks, Marc ------------- Commit messages: - Revert now useless fix - Generalize the not-array proof Changes: https://git.openjdk.org/jdk/pull/24245/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24245&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8348853 Stats: 46 lines in 5 files changed: 35 ins; 7 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24245/head:pull/24245 PR: https://git.openjdk.org/jdk/pull/24245 From stefank at openjdk.org Wed Mar 26 10:26:12 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Mar 2025 10:26:12 GMT Subject: RFR: 8352948: Remove leftover runtime_x86_32.cpp after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:31:57 GMT, Aleksey Shipilev wrote: > As I merged [JDK-8345169](https://bugs.openjdk.org/browse/JDK-8345169), I noticed one more file was left over: > > > $ find | grep x86_32 > ./src/hotspot/cpu/x86/runtime_x86_32.cpp > > > My bad, this PR removes that leftover. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24249#pullrequestreview-2716613931 From jbhateja at openjdk.org Wed Mar 26 11:21:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 26 Mar 2025 11:21:02 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v4] In-Reply-To: References: Message-ID: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24169/files - new: https://git.openjdk.org/jdk/pull/24169/files/fe793a53..ec578e57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=02-03 Stats: 11 lines in 2 files changed: 5 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24169/head:pull/24169 PR: https://git.openjdk.org/jdk/pull/24169 From jbhateja at openjdk.org Wed Mar 26 11:21:03 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 26 Mar 2025 11:21:03 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v2] In-Reply-To: <4IiXOB8F1-hsF_ulbTMZ-dlG7YWGieifLm-EiR5x24c=.aa0106aa-57b5-40ea-b515-cff933cbb2b5@github.com> References: <4IiXOB8F1-hsF_ulbTMZ-dlG7YWGieifLm-EiR5x24c=.aa0106aa-57b5-40ea-b515-cff933cbb2b5@github.com> Message-ID: On Tue, 25 Mar 2025 15:01:47 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7093: >> >>> 7091: } >>> 7092: >>> 7093: void C2_MacroAssembler::scalar_max_min_fp16(int opcode, XMMRegister dst, XMMRegister src1, XMMRegister src2, >> >> Any reason we are not doing this on lines of scalar emit_fp_min_max? For most common cases emit_fp_min_max based sequence would have much better latency. > > emit_fp_min_max in x86_64.ad doesn't have any blend emulation. Hi @sviswa7 , Instruction sequence similar to emit_fp_min_max for half floats prevent issuance of micro-ops from Decoded ICache, this makes its performance worse than the proposed sequence, it seems existence of several branches within 32 byte window is the problem. Section 3.4.2.5 "Optimization for Decoded ICache" has more details on this. The proposed sequence is also vector-friendly. ![image](https://github.com/user-attachments/assets/0efcb12b-dcb4-4346-b3fa-9fefeb46636f) [max_micro_sequences.txt](https://github.com/user-attachments/files/19465321/max_micro_sequences.txt) Do you suggest going with the proposed performant sequence to fix this bug and addressing any shortcoming after more experimintation later? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2013923122 From thartmann at openjdk.org Wed Mar 26 12:30:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 26 Mar 2025 12:30:54 GMT Subject: RFR: 8352965: [BACKOUT] 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure Message-ID: The fix fails with CTW in tier3 (see JBS for more details). Clean backout. Thanks, Tobias ------------- Commit messages: - Revert "8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure" Changes: https://git.openjdk.org/jdk/pull/24252/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24252&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352965 Stats: 99 lines in 7 files changed: 3 ins; 45 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/24252.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24252/head:pull/24252 PR: https://git.openjdk.org/jdk/pull/24252 From chagedorn at openjdk.org Wed Mar 26 12:35:22 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Mar 2025 12:35:22 GMT Subject: RFR: 8352965: [BACKOUT] 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 12:24:44 GMT, Tobias Hartmann wrote: > The fix fails with CTW in tier3 (see JBS for more details). Clean backout. > > Thanks, > Tobias Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24252#pullrequestreview-2716993367 From thartmann at openjdk.org Wed Mar 26 12:35:23 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 26 Mar 2025 12:35:23 GMT Subject: Integrated: 8352965: [BACKOUT] 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 12:24:44 GMT, Tobias Hartmann wrote: > The fix fails with CTW in tier3 (see JBS for more details). Clean backout. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 84d3dc75 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/84d3dc75e4ebd1a4724b09842fd5a63900536dd1 Stats: 99 lines in 7 files changed: 3 ins; 45 del; 51 mod 8352965: [BACKOUT] 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24252 From thartmann at openjdk.org Wed Mar 26 12:35:23 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 26 Mar 2025 12:35:23 GMT Subject: RFR: 8352965: [BACKOUT] 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 12:24:44 GMT, Tobias Hartmann wrote: > The fix fails with CTW in tier3 (see JBS for more details). Clean backout. > > Thanks, > Tobias Thanks Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24252#issuecomment-2754248380 From eastigeevich at openjdk.org Wed Mar 26 12:48:22 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 26 Mar 2025 12:48:22 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 23:00:36 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Relocate nmethod at safepoint > - Fix windows build src/hotspot/share/code/nmethod.cpp line 1492: > 1490: // Relocate nmethod at safepoint > 1491: VM_RelocateNMethod relocate_nmethod(nm, code_blob_type); > 1492: VMThread::execute(&relocate_nmethod); You should not do this here. It will be a responsibility of a caller to ensure relocation is done at a safepoint. A caller will get to a safepoint and relocate a bunch of nmethods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2014063631 From eastigeevich at openjdk.org Wed Mar 26 12:48:22 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 26 Mar 2025 12:48:22 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 12:43:22 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: >> >> - Relocate nmethod at safepoint >> - Fix windows build > > src/hotspot/share/code/nmethod.cpp line 1492: > >> 1490: // Relocate nmethod at safepoint >> 1491: VM_RelocateNMethod relocate_nmethod(nm, code_blob_type); >> 1492: VMThread::execute(&relocate_nmethod); > > You should not do this here. It will be a responsibility of a caller to ensure relocation is done at a safepoint. A caller will get to a safepoint and relocate a bunch of nmethods. In your PR this will be responsibility of WhiteBox. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2014067304 From eosterlund at openjdk.org Wed Mar 26 13:06:26 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 26 Mar 2025 13:06:26 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 23:00:36 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Relocate nmethod at safepoint > - Fix windows build I have only skimmed through what you are doing but what I have read makes me worried from a GC point of view. In general, I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles. It might be that some of my concerns are false because this is more of a drive by review to sanity check if you thought about the GC implications. These are just random things on top of my head. 1) You can't just copy oops. Propagating stale pointers to a new nmethod is not valid and will make the GC vomit. The GC assumes that it can traverse a snapshot of nmethods, and that new nmethods created after that snapshot, will have sane valid oops initially, and hence do not need fixing. Copying stale oops to a new nmethod would violate those invariants and inevitably blow up. 2) Class redefinition tracks in an external data structure which nmethods contained metadata that we want to eventually throw away. This is done to avoid walking the entire code cache just to keep tabs on the one nmethod that still uses the old metadata. If we clone the nmethod without putting it in said data structure, we will blow up. 3) I'm worried about the initial state of the nmethod entry barrier guard value being copied from the source nmethod, instead of having the initial value we expect for newly created nmethods. It means that the initial invocation will not get the nmethod entry barrier callback. The GC traverses the nmethods assuming that new nmethods created during the traversal will not start off with weird stale values. 4) I'm worried about copying the nmethod epoch counters used by virtual threads to mark which nmethods have been found on-stack. Copying it implies that this nmethod has been found on-stack even though it never has. To me, the implications are unknown, but perhaps you thought about it? 5) You don't check if the nmethod is_unloading() when cloning it. That means you can create a new nmethod that has dead oops from the get go - that cannot be allowed 6) Have you checked what the JVMCI speculation data and JVMCI data contains and if your approach will break that? JVMCI has an nmethod mirror object that refers back to the nmethod - this is unlikely to work out of the box with cloning. 7) By running the operation in a safepoint you a) introduce an obvious latency problem, b) create a new source for stale nmethod pointers that will become stale and burn. The _nm of the safepoint operation might not survive a safepoint. For example, if a GC safepoint runs first, the GC might decide to unload the nmethod. It then traverses all known pointers to stale nmethods, and cleans them up so that nobody is referring to the nmethod any longer. Naturally, the GC won't know that there is a stale _nm pointer embedded into your VM operation. When you start messing around with it you enter a use-after-free situation and we will blow up. 8) What are the consequences of copying the deoptimization generation? I don't know! 9) Sometimes the method() is null when using Truffle. 10) Since you don't hold the Compile_lock across the safepoint, it's not obvious to me that you can't get a not_installed nmethod. Can you? I don't know what the consequences are of cloning one of those. The target nmethod will start off as not_installed, but I don't know that it will be made in_use. 11) These new special nmethods call post_init after installing the nmethod in the Method, while normally the order is reversed. While this may or may not be okay, it introduces a new anomaly where new special nmethods are being special. In general, every time that we have ever introduced "special nmethods" that work in different ways to "normal" nmethods, it has been a huge pain. With this approach to nmethod relocation, every time somebody adds a stateful field to nmethod, one will have to think very carefully about the impact on this cloning, and how that can end up affecting class redefinition, GC, etc. I really don't think we want this extra mental overhead, unless the motivation is exceptionally good. ------------- Changes requested by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23573#pullrequestreview-2717095140 From duke at openjdk.org Wed Mar 26 13:55:24 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 26 Mar 2025 13:55:24 GMT Subject: RFR: 8352490: Fatal error message for unhandled bytecode needs more detail [v2] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 14:19:37 GMT, Saranya Natarajan wrote: >> Description: Improve the error message for unhandled bytecode in `line#129` of function `Bytecodes::Code ciBytecodeStream::next_wide_or_table` in file ciStream.cpp >> >> Solution: The error message is improved to print OPCODE and bytecode index (BCI) > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > adding information for printing current method Thank you for reviewing the changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24187#issuecomment-2754479204 From mdoerr at openjdk.org Wed Mar 26 14:21:17 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Mar 2025 14:21:17 GMT Subject: RFR: 8352972: PPC64: Intrinsify Unsafe::setMemory Message-ID: Similar to the x86 implementation. Before this patch: Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentZeroUnsafe.panama true 1 avgt 30 15.048 ? 0.095 ns/op MemorySegmentZeroUnsafe.panama true 2 avgt 30 15.054 ? 0.089 ns/op MemorySegmentZeroUnsafe.panama true 3 avgt 30 15.161 ? 0.089 ns/op MemorySegmentZeroUnsafe.panama true 4 avgt 30 15.147 ? 0.082 ns/op MemorySegmentZeroUnsafe.panama true 5 avgt 30 15.198 ? 0.089 ns/op MemorySegmentZeroUnsafe.panama true 6 avgt 30 15.128 ? 0.099 ns/op MemorySegmentZeroUnsafe.panama true 7 avgt 30 19.234 ? 0.148 ns/op MemorySegmentZeroUnsafe.panama true 8 avgt 30 15.060 ? 0.090 ns/op MemorySegmentZeroUnsafe.panama true 15 avgt 30 19.229 ? 0.171 ns/op MemorySegmentZeroUnsafe.panama true 16 avgt 30 15.030 ? 0.082 ns/op MemorySegmentZeroUnsafe.panama true 63 avgt 30 85.290 ? 0.431 ns/op MemorySegmentZeroUnsafe.panama true 64 avgt 30 84.273 ? 0.843 ns/op MemorySegmentZeroUnsafe.panama true 255 avgt 30 89.551 ? 0.706 ns/op MemorySegmentZeroUnsafe.panama true 256 avgt 30 87.736 ? 0.679 ns/op MemorySegmentZeroUnsafe.panama false 1 avgt 30 15.044 ? 0.073 ns/op MemorySegmentZeroUnsafe.panama false 2 avgt 30 14.980 ? 0.058 ns/op MemorySegmentZeroUnsafe.panama false 3 avgt 30 15.138 ? 0.126 ns/op MemorySegmentZeroUnsafe.panama false 4 avgt 30 15.025 ? 0.049 ns/op MemorySegmentZeroUnsafe.panama false 5 avgt 30 15.192 ? 0.118 ns/op MemorySegmentZeroUnsafe.panama false 6 avgt 30 15.464 ? 0.667 ns/op MemorySegmentZeroUnsafe.panama false 7 avgt 30 19.179 ? 0.143 ns/op MemorySegmentZeroUnsafe.panama false 8 avgt 30 15.278 ? 0.130 ns/op MemorySegmentZeroUnsafe.panama false 15 avgt 30 19.428 ? 0.146 ns/op MemorySegmentZeroUnsafe.panama false 16 avgt 30 18.011 ? 1.233 ns/op MemorySegmentZeroUnsafe.panama false 63 avgt 30 87.090 ? 0.989 ns/op MemorySegmentZeroUnsafe.panama false 64 avgt 30 86.513 ? 0.623 ns/op MemorySegmentZeroUnsafe.panama false 255 avgt 30 89.415 ? 0.831 ns/op MemorySegmentZeroUnsafe.panama false 256 avgt 30 90.665 ? 0.798 ns/op MemorySegmentZeroUnsafe.unsafe true 1 avgt 30 86.530 ? 0.504 ns/op MemorySegmentZeroUnsafe.unsafe true 2 avgt 30 84.540 ? 0.399 ns/op MemorySegmentZeroUnsafe.unsafe true 3 avgt 30 86.954 ? 0.768 ns/op MemorySegmentZeroUnsafe.unsafe true 4 avgt 30 86.409 ? 0.801 ns/op MemorySegmentZeroUnsafe.unsafe true 5 avgt 30 86.774 ? 0.808 ns/op MemorySegmentZeroUnsafe.unsafe true 6 avgt 30 86.128 ? 0.804 ns/op MemorySegmentZeroUnsafe.unsafe true 7 avgt 30 86.512 ? 0.434 ns/op MemorySegmentZeroUnsafe.unsafe true 8 avgt 30 85.680 ? 0.335 ns/op MemorySegmentZeroUnsafe.unsafe true 15 avgt 30 88.098 ? 0.660 ns/op MemorySegmentZeroUnsafe.unsafe true 16 avgt 30 86.162 ? 0.634 ns/op MemorySegmentZeroUnsafe.unsafe true 63 avgt 30 87.605 ? 0.606 ns/op MemorySegmentZeroUnsafe.unsafe true 64 avgt 30 86.423 ? 0.667 ns/op MemorySegmentZeroUnsafe.unsafe true 255 avgt 30 89.882 ? 0.416 ns/op MemorySegmentZeroUnsafe.unsafe true 256 avgt 30 89.026 ? 0.555 ns/op MemorySegmentZeroUnsafe.unsafe false 1 avgt 30 86.808 ? 0.250 ns/op MemorySegmentZeroUnsafe.unsafe false 2 avgt 30 86.504 ? 0.427 ns/op MemorySegmentZeroUnsafe.unsafe false 3 avgt 30 87.304 ? 0.570 ns/op MemorySegmentZeroUnsafe.unsafe false 4 avgt 30 85.787 ? 0.395 ns/op MemorySegmentZeroUnsafe.unsafe false 5 avgt 30 86.032 ? 0.517 ns/op MemorySegmentZeroUnsafe.unsafe false 6 avgt 30 85.668 ? 0.414 ns/op MemorySegmentZeroUnsafe.unsafe false 7 avgt 30 85.621 ? 0.457 ns/op MemorySegmentZeroUnsafe.unsafe false 8 avgt 30 85.744 ? 0.384 ns/op MemorySegmentZeroUnsafe.unsafe false 15 avgt 30 85.898 ? 0.380 ns/op MemorySegmentZeroUnsafe.unsafe false 16 avgt 30 86.993 ? 0.532 ns/op MemorySegmentZeroUnsafe.unsafe false 63 avgt 30 86.700 ? 0.558 ns/op MemorySegmentZeroUnsafe.unsafe false 64 avgt 30 87.678 ? 0.721 ns/op MemorySegmentZeroUnsafe.unsafe false 255 avgt 30 91.774 ? 0.860 ns/op MemorySegmentZeroUnsafe.unsafe false 256 avgt 30 89.748 ? 0.749 ns/op With this patch: Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentZeroUnsafe.panama true 1 avgt 30 15.206 ? 0.113 ns/op MemorySegmentZeroUnsafe.panama true 2 avgt 30 15.106 ? 0.094 ns/op MemorySegmentZeroUnsafe.panama true 3 avgt 30 15.314 ? 0.118 ns/op MemorySegmentZeroUnsafe.panama true 4 avgt 30 15.067 ? 0.078 ns/op MemorySegmentZeroUnsafe.panama true 5 avgt 30 15.192 ? 0.094 ns/op MemorySegmentZeroUnsafe.panama true 6 avgt 30 15.145 ? 0.098 ns/op MemorySegmentZeroUnsafe.panama true 7 avgt 30 19.353 ? 0.176 ns/op MemorySegmentZeroUnsafe.panama true 8 avgt 30 15.164 ? 0.070 ns/op MemorySegmentZeroUnsafe.panama true 15 avgt 30 19.201 ? 0.103 ns/op MemorySegmentZeroUnsafe.panama true 16 avgt 30 15.138 ? 0.092 ns/op MemorySegmentZeroUnsafe.panama true 63 avgt 30 27.875 ? 0.783 ns/op MemorySegmentZeroUnsafe.panama true 64 avgt 30 19.560 ? 0.252 ns/op MemorySegmentZeroUnsafe.panama true 255 avgt 30 91.272 ? 0.568 ns/op MemorySegmentZeroUnsafe.panama true 256 avgt 30 19.582 ? 0.089 ns/op MemorySegmentZeroUnsafe.panama false 1 avgt 30 15.049 ? 0.117 ns/op MemorySegmentZeroUnsafe.panama false 2 avgt 30 15.096 ? 0.095 ns/op MemorySegmentZeroUnsafe.panama false 3 avgt 30 15.094 ? 0.073 ns/op MemorySegmentZeroUnsafe.panama false 4 avgt 30 15.012 ? 0.068 ns/op MemorySegmentZeroUnsafe.panama false 5 avgt 30 15.130 ? 0.121 ns/op MemorySegmentZeroUnsafe.panama false 6 avgt 30 15.079 ? 0.090 ns/op MemorySegmentZeroUnsafe.panama false 7 avgt 30 19.121 ? 0.120 ns/op MemorySegmentZeroUnsafe.panama false 8 avgt 30 15.153 ? 0.136 ns/op MemorySegmentZeroUnsafe.panama false 15 avgt 30 19.516 ? 0.101 ns/op MemorySegmentZeroUnsafe.panama false 16 avgt 30 19.054 ? 0.091 ns/op MemorySegmentZeroUnsafe.panama false 63 avgt 30 28.211 ? 0.742 ns/op MemorySegmentZeroUnsafe.panama false 64 avgt 30 30.415 ? 0.368 ns/op MemorySegmentZeroUnsafe.panama false 255 avgt 30 93.071 ? 0.785 ns/op MemorySegmentZeroUnsafe.panama false 256 avgt 30 93.184 ? 0.594 ns/op MemorySegmentZeroUnsafe.unsafe true 1 avgt 30 19.361 ? 0.085 ns/op MemorySegmentZeroUnsafe.unsafe true 2 avgt 30 19.415 ? 0.101 ns/op MemorySegmentZeroUnsafe.unsafe true 3 avgt 30 19.198 ? 0.111 ns/op MemorySegmentZeroUnsafe.unsafe true 4 avgt 30 19.380 ? 0.066 ns/op MemorySegmentZeroUnsafe.unsafe true 5 avgt 30 19.107 ? 0.057 ns/op MemorySegmentZeroUnsafe.unsafe true 6 avgt 30 19.097 ? 0.064 ns/op MemorySegmentZeroUnsafe.unsafe true 7 avgt 30 19.520 ? 0.381 ns/op MemorySegmentZeroUnsafe.unsafe true 8 avgt 30 19.406 ? 0.093 ns/op MemorySegmentZeroUnsafe.unsafe true 15 avgt 30 19.210 ? 0.049 ns/op MemorySegmentZeroUnsafe.unsafe true 16 avgt 30 19.459 ? 0.092 ns/op MemorySegmentZeroUnsafe.unsafe true 63 avgt 30 29.300 ? 0.235 ns/op MemorySegmentZeroUnsafe.unsafe true 64 avgt 30 19.200 ? 0.080 ns/op MemorySegmentZeroUnsafe.unsafe true 255 avgt 30 91.678 ? 0.243 ns/op MemorySegmentZeroUnsafe.unsafe true 256 avgt 30 19.793 ? 0.139 ns/op MemorySegmentZeroUnsafe.unsafe false 1 avgt 30 19.430 ? 0.082 ns/op MemorySegmentZeroUnsafe.unsafe false 2 avgt 30 19.469 ? 0.106 ns/op MemorySegmentZeroUnsafe.unsafe false 3 avgt 30 19.264 ? 0.123 ns/op MemorySegmentZeroUnsafe.unsafe false 4 avgt 30 19.260 ? 0.080 ns/op MemorySegmentZeroUnsafe.unsafe false 5 avgt 30 19.210 ? 0.068 ns/op MemorySegmentZeroUnsafe.unsafe false 6 avgt 30 19.240 ? 0.066 ns/op MemorySegmentZeroUnsafe.unsafe false 7 avgt 30 20.132 ? 0.375 ns/op MemorySegmentZeroUnsafe.unsafe false 8 avgt 30 20.148 ? 0.358 ns/op MemorySegmentZeroUnsafe.unsafe false 15 avgt 30 19.405 ? 0.154 ns/op MemorySegmentZeroUnsafe.unsafe false 16 avgt 30 19.375 ? 0.119 ns/op MemorySegmentZeroUnsafe.unsafe false 63 avgt 30 29.458 ? 0.491 ns/op MemorySegmentZeroUnsafe.unsafe false 64 avgt 30 29.554 ? 0.817 ns/op MemorySegmentZeroUnsafe.unsafe false 255 avgt 30 93.094 ? 0.789 ns/op MemorySegmentZeroUnsafe.unsafe false 256 avgt 30 93.630 ? 0.869 ns/op `Unsafe` cases with small Cnt are significantly faster. Aligned large cases, too. ------------- Commit messages: - 8352972: PPC64: Intrinsify Unsafe::setMemory Changes: https://git.openjdk.org/jdk/pull/24254/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24254&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352972 Stats: 109 lines in 1 file changed: 109 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24254/head:pull/24254 PR: https://git.openjdk.org/jdk/pull/24254 From duke at openjdk.org Wed Mar 26 14:34:47 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 26 Mar 2025 14:34:47 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v3] In-Reply-To: References: Message-ID: <5OOpDW693XhGVePrz6zYlm9gMZKneFaIfl6BJP-NQb0=.c5757e84-6b14-442b-b822-d0b1eeb5f913@github.com> > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: ir-framework: rename new nodes to convention ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24248/files - new: https://git.openjdk.org/jdk/pull/24248/files/08afa3d5..6f015a67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=01-02 Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24248/head:pull/24248 PR: https://git.openjdk.org/jdk/pull/24248 From roland at openjdk.org Wed Mar 26 14:38:21 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 26 Mar 2025 14:38:21 GMT Subject: RFR: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does [v14] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 13:28:50 GMT, Emanuel Peter wrote: >> @eme64 any update on testing? > > @rwestrel Testing looks good, thanks for the ping :) @eme64 thanks for running tests and for the review. @chhagedorn @jaskarth thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23438#issuecomment-2754656863 From roland at openjdk.org Wed Mar 26 14:41:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 26 Mar 2025 14:41:12 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: <2OJ3-DbdBBiHjZKVDGNXb-IFLqXi0jI4Ezz_zs6NsMA=.a904b56a-a3c4-4f06-a708-70223beccb60@github.com> Message-ID: On Wed, 26 Mar 2025 08:46:32 GMT, Christian Hagedorn wrote: > What do you think? I will also have a closer look at the actual code later this week. Thanks for the update. That sounds good to me. I'll update the change with the new command line flag. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2754668270 From roland at openjdk.org Wed Mar 26 14:41:28 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 26 Mar 2025 14:41:28 GMT Subject: Integrated: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does In-Reply-To: References: Message-ID: <5p0v_U4BzoVK3_7SxAB8hOks_LKCo18xONpeqH81ulU=.b045f84b-48c6-42d4-be99-70d8a3894b00@github.com> On Tue, 4 Feb 2025 13:02:47 GMT, Roland Westrelin wrote: > This change refactors `RShiftI`/`RshiftL` `Ideal`, `Identity` and > `Value` because the `int` and `long` versions are very similar and so > there's no logic duplication. In the process, support for some extra > transformations is added to `RShiftL`. I also added some new test > cases. This pull request has now been integrated. Changeset: 79bffe2f Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/79bffe2f28f90986d45f4e91efc021290b4fc00a Stats: 373 lines in 8 files changed: 238 ins; 58 del; 77 mod 8349361: C2: RShiftL should support all applicable transformations that RShiftI does Reviewed-by: epeter, chagedorn, jkarthikeyan ------------- PR: https://git.openjdk.org/jdk/pull/23438 From chagedorn at openjdk.org Wed Mar 26 15:08:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 26 Mar 2025 15:08:15 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v3] In-Reply-To: <5OOpDW693XhGVePrz6zYlm9gMZKneFaIfl6BJP-NQb0=.c5757e84-6b14-442b-b822-d0b1eeb5f913@github.com> References: <5OOpDW693XhGVePrz6zYlm9gMZKneFaIfl6BJP-NQb0=.c5757e84-6b14-442b-b822-d0b1eeb5f913@github.com> Message-ID: On Wed, 26 Mar 2025 14:34:47 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > ir-framework: rename new nodes to convention A few comments but overall it looks good. Thanks for cleaning that up! src/hotspot/share/opto/c2_globals.hpp line 789: > 787: product(bool, UseProfiledLoopPredicate, true, \ > 788: "Move predicates out of loops based on profiling data. " \ > 789: "Requires UseLoopPredicate to be turned on (default).") \ It was already a bit vague before but I suggest to be more precise that we move checks with an uncommon trap out of a loop (and the resulting check before the loop is then a predicate): Move checks with an uncommon trap out of loops based on profiling data. Requires [...] src/hotspot/share/opto/loopnode.cpp line 4304: > 4302: tty->print(" profile_predicated"); > 4303: } > 4304: if (UseLoopPredicate && predicates.loop_predicate_block()->is_non_empty()) { Maybe you can merge these blocks: if (UseLoopPredicate) { if (UseProfiledLoopPredicate && predicates.profiled_loop_predicate_block()->is_non_empty()) { tty->print(" profile_predicated"); } if (predicates.loop_predicate_block()->is_non_empty()) { tty->print(" predicated"); } } test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1524: > 1522: public static final String PARSE_PREDICATE_LOOP = PREFIX + "PARSE_PREDICATE_LOOP" + POSTFIX; > 1523: static { > 1524: parsePredicateNodes(PARSE_PREDICATE_LOOP, "Loop"); I suggest the following names found in `predicates.hpp`: https://github.com/openjdk/jdk/blob/79bffe2f28f90986d45f4e91efc021290b4fc00a/src/hotspot/share/opto/predicates.hpp#L48-L50 test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2763: > 2761: IR_NODE_MAPPINGS.put(irNodePlaceholder, new SinglePhaseRangeEntry(CompilePhase.AFTER_PARSING, regex, > 2762: CompilePhase.AFTER_PARSING, > 2763: CompilePhase.CCP1)); I think the legal last phase should be `CompilePhase.PHASEIDEALLOOP_ITERATIONS` where we could observe `ParsePredicates`: Suggestion: CompilePhase.PHASEIDEALLOOP_ITERATIONS)); test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 41: > 39: static final int WARMUP = 10_000; > 40: static final int SIZE = 100; > 41: static final int min = 3; Since `min` is also a constant, you should capitalize it. test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 46: > 44: TestFramework.runWithFlags("-XX:+UseLoopPredicate", > 45: "-XX:+UseProfiledLoopPredicate"); > 46: TestFramework.runWithFlags("-XX:-UseLoopPredicate"); You could also add a run where you only disable `-XX:-UseProfiledLoopPredicate` for completness and add an IR rule accorndingly. test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 49: > 47: } > 48: > 49: @Run(test = { "test" }) Braces are not required here: Suggestion: @Run(test = "test") test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 72: > 70: > 71: @Test > 72: @IR(counts = { IRNode.PARSE_PREDICATE_LOOP, "=1", The `=` is not required: Suggestion: @IR(counts = { IRNode.PARSE_PREDICATE_LOOP, "1", test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 74: > 72: @IR(counts = { IRNode.PARSE_PREDICATE_LOOP, "=1", > 73: IRNode.PARSE_PREDICATE_PROFILED_LOOP, "1" }, > 74: phase = CompilePhase.AFTER_PARSING, `phase` is not required since you've decided that `AFTER_PARSING` is the default phase where we match this node on. You only need to specify `phase` if you want to match on a different phase. ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2717534252 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014385700 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014346920 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014355627 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014361300 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014364335 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014371293 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014363733 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014364939 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2014366623 From mdoerr at openjdk.org Wed Mar 26 15:23:28 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Mar 2025 15:23:28 GMT Subject: RFR: 8352972: PPC64: Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: Message-ID: > Similar to the x86 implementation. > > Before this patch (measured on Power10): > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 15.048 ? 0.095 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 15.054 ? 0.089 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 15.161 ? 0.089 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 15.147 ? 0.082 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 15.198 ? 0.089 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 15.128 ? 0.099 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 19.234 ? 0.148 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 15.060 ? 0.090 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 19.229 ? 0.171 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 15.030 ? 0.082 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 85.290 ? 0.431 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 84.273 ? 0.843 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 89.551 ? 0.706 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 87.736 ? 0.679 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 15.044 ? 0.073 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 14.980 ? 0.058 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 15.138 ? 0.126 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 15.025 ? 0.049 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 15.192 ? 0.118 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 15.464 ? 0.667 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 19.179 ? 0.143 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 15.278 ? 0.130 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 19.428 ? 0.146 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 18.011 ? 1.233 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30 87.090 ? 0.989 ns/op > MemorySegmentZeroUnsafe.panama false 64 avgt 30 86.513 ? 0.623 ns/op > MemorySegmentZeroUnsafe.panama false 255 avgt 30 89.415 ? 0.831 ns/op > MemorySegmentZeroUnsafe.panama false 256 avgt 30 90.665 ?... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Remove unused Label. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24254/files - new: https://git.openjdk.org/jdk/pull/24254/files/fdcd2092..b47017cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24254&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24254&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24254/head:pull/24254 PR: https://git.openjdk.org/jdk/pull/24254 From duke at openjdk.org Wed Mar 26 15:27:39 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 26 Mar 2025 15:27:39 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v4] In-Reply-To: References: Message-ID: > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from @chhagedorn Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24248/files - new: https://git.openjdk.org/jdk/pull/24248/files/6f015a67..ea653995 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24248/head:pull/24248 PR: https://git.openjdk.org/jdk/pull/24248 From kbarrett at openjdk.org Wed Mar 26 15:28:20 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Mar 2025 15:28:20 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:18:25 GMT, Afshin Zafari wrote: > The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift and is reported as > `runtime error: left shift of negative value -25 at relocInfo.cpp:...` > > Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. > > Tests: > linux-x64-debug tier1 on a UBSAN enabled build. I think the description in the PR is wrong. I think the `offset` variable is not a positive number so large that the sign bit gets set (effectively treating it as unsigned). Rather, it's explicitly negated in `scaled_offset`. So this code is (almost) always doing a shift of a negative value. (Looks like @dean-long found this too.) The comments I made about shift of negative values in https://github.com/openjdk/jdk/pull/24184 also apply here. I'm inclined to call the ubsan warning here an effectively false positive, because we don't (and never have) cared about the technical UB that no implementation is making use of and will be going away with C++20. I think a new `left_shift_no_overflow()` operation isn't really needed here. As @vnkozlov notes, the range of values is just not going to be so large that overflow is an issue here. Though such an operation might be useful for other reasons. It would be a shared place to hang the ubsan-ignore of the shift of a negative value, rather than ATTRIBUTE_NO_UBSAN littering. Maybe instead the negation of the offset could be removed, and wherever the value is used could be updated to account for that change. Or maybe we should just ignore the ubsan warning here. This is not an appropriate use of a JAVA_INTEGER{_SHIFT}_OP. https://github.com/openjdk/jdk/blame/a81250c55312dfdeb4d65970cff683e6f0783ca7/src/hotspot/share/utilities/globalDefinitions.hpp#L1201-L1204 // Sum and product which can never overflow: they wrap, just like the // Java operations. Note that we don't intend these to be used for // general-purpose arithmetic: their purpose is to emulate Java // operations. That is, they should only be used where we're emulating Java arithmetic, such as by the compiler constant folding a Java arithmetic expression. That isn't the case here. (The comment only mentions "sum and product" because it wasn't updated to account for the later addition of the shift operations.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2754822720 From shade at openjdk.org Wed Mar 26 16:11:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 16:11:10 GMT Subject: RFR: 8352948: Remove leftover runtime_x86_32.cpp after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:31:57 GMT, Aleksey Shipilev wrote: > As I merged [JDK-8345169](https://bugs.openjdk.org/browse/JDK-8345169), I noticed one more file was left over: > > > $ find | grep x86_32 > ./src/hotspot/cpu/x86/runtime_x86_32.cpp > > > My bad, this PR removes that leftover. Thanks! Would you (or any other Reviewer) consider this trivial? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24249#issuecomment-2754965477 From mdoerr at openjdk.org Wed Mar 26 16:36:10 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Mar 2025 16:36:10 GMT Subject: RFR: 8352972: PPC64: Intrinsify Unsafe::setMemory [v3] In-Reply-To: References: Message-ID: > Similar to the x86 implementation. > > Before this patch (measured on Power10): > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 15.048 ? 0.095 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 15.054 ? 0.089 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 15.161 ? 0.089 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 15.147 ? 0.082 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 15.198 ? 0.089 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 15.128 ? 0.099 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 19.234 ? 0.148 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 15.060 ? 0.090 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 19.229 ? 0.171 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 15.030 ? 0.082 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 85.290 ? 0.431 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 84.273 ? 0.843 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 89.551 ? 0.706 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 87.736 ? 0.679 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 15.044 ? 0.073 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 14.980 ? 0.058 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 15.138 ? 0.126 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 15.025 ? 0.049 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 15.192 ? 0.118 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 15.464 ? 0.667 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 19.179 ? 0.143 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 15.278 ? 0.130 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 19.428 ? 0.146 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 18.011 ? 1.233 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30 87.090 ? 0.989 ns/op > MemorySegmentZeroUnsafe.panama false 64 avgt 30 86.513 ? 0.623 ns/op > MemorySegmentZeroUnsafe.panama false 255 avgt 30 89.415 ? 0.831 ns/op > MemorySegmentZeroUnsafe.panama false 256 avgt 30 90.665 ?... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Simplify usage of UnsafeMemoryAccessMark. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24254/files - new: https://git.openjdk.org/jdk/pull/24254/files/b47017cc..d4bc3feb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24254&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24254&range=01-02 Stats: 22 lines in 1 file changed: 0 ins; 9 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/24254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24254/head:pull/24254 PR: https://git.openjdk.org/jdk/pull/24254 From mdoerr at openjdk.org Wed Mar 26 16:37:14 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Mar 2025 16:37:14 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: <7jVhKpVfHqZRix25pTXA28NA2RW2OKuWH1XR-TIfpLw=.ccd76cbd-7d73-4df6-8f54-0512a1d0cae9@github.com> On Tue, 4 Mar 2025 09:46:53 GMT, Anjian-Wen wrote: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about 15%-20% unsafe setmemory time You may want to use `UnsafeMemoryAccessMark` as on x86. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2755044189 From zzambers at openjdk.org Wed Mar 26 17:43:54 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Wed, 26 Mar 2025 17:43:54 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 17:37:58 GMT, Zdenek Zambersky wrote: > This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. Attached file which shows unrecognized VM options for individual tests. [unrecognized-options.txt](https://github.com/user-attachments/files/19472912/unrecognized-options.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2755259048 From zzambers at openjdk.org Wed Mar 26 17:43:54 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Wed, 26 Mar 2025 17:43:54 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option Message-ID: This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. ------------- Commit messages: - Added missing requires for c2 compiler tests Changes: https://git.openjdk.org/jdk/pull/24262/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24262&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8252473 Stats: 80 lines in 71 files changed: 71 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24262.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24262/head:pull/24262 PR: https://git.openjdk.org/jdk/pull/24262 From shade at openjdk.org Wed Mar 26 17:52:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 17:52:13 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 17:37:58 GMT, Zdenek Zambersky wrote: > This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. test/hotspot/jtreg/compiler/arraycopy/TestCloneWithStressReflectiveCode.java line 28: > 26: * @bug 8284951 > 27: * @summary Test clone intrinsic with StressReflectiveCode. > 28: * @requires vm.compiler2.enabled & vm.debug Drive-by comment: multiple `@requires` get AND-ed automatically, so you can just drop a new line with `@requires vm.compiler2.enabled`, and it will still work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24262#discussion_r2014731449 From kbarrett at openjdk.org Wed Mar 26 18:17:07 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Mar 2025 18:17:07 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:18:25 GMT, Afshin Zafari wrote: > The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift and is reported as > `runtime error: left shift of negative value -25 at relocInfo.cpp:...` > > Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. > > Tests: > linux-x64-debug tier1 on a UBSAN enabled build. Maybe the suggested helper function should just be called `left_shift`. Document as allowing negative base, but asserting no overflow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2755369418 From sparasa at openjdk.org Wed Mar 26 18:37:21 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 26 Mar 2025 18:37:21 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: <_QvyAWuOP7uiUcWy0jEb6tN0CNIQOAQqZh8-7BxIWy4=.5a406f3e-ad92-492d-84c9-a3ef7e7941b2@github.com> References: <_QvyAWuOP7uiUcWy0jEb6tN0CNIQOAQqZh8-7BxIWy4=.5a406f3e-ad92-492d-84c9-a3ef7e7941b2@github.com> Message-ID: On Tue, 25 Mar 2025 23:27:28 GMT, Vladimir Ivanov wrote: >> Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: >> >> create a separate method to check for cpu's supporting avx512 version of simd sort > > src/hotspot/cpu/x86/vm_version_x86.hpp line 778: > >> 776: static bool supports_avx512_simd_sort() { >> 777: // Disable AVX512 version of SIMD Sort on AMD Zen4 Processors >> 778: return ((is_intel() || (is_amd() && (cpu_family() > CPU_FAMILY_AMD_19H))) && supports_avx512dq()); } > > It's quite hard to parse. The following looks clearer to me: > > if (supports_avx512dq()) { > // Disable AVX512 version of SIMD Sort on AMD Zen4 Processors. > if (is_amd() && cpu_family() == CPU_FAMILY_AMD_19H) { > return false; > } > return true; > } > return false; I second the suggested refactoring. Need to make sure the original `is_intel()` check is also included appropriately in the logic :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r2014795963 From sviswanathan at openjdk.org Wed Mar 26 18:40:10 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 26 Mar 2025 18:40:10 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v2] In-Reply-To: References: <4IiXOB8F1-hsF_ulbTMZ-dlG7YWGieifLm-EiR5x24c=.aa0106aa-57b5-40ea-b515-cff933cbb2b5@github.com> Message-ID: On Wed, 26 Mar 2025 11:17:42 GMT, Jatin Bhateja wrote: >> emit_fp_min_max in x86_64.ad doesn't have any blend emulation. > > Hi @sviswa7 , > Instruction sequence similar to emit_fp_min_max for half floats prevent issuance of micro-ops from Decoded ICache, this makes its performance worse than the proposed sequence, it seems existence of several branches within 32 byte window is the problem. Section 3.4.2.5 "Optimization for Decoded ICache" has more details on this. The proposed sequence is also vector-friendly. > > ![image](https://github.com/user-attachments/assets/0efcb12b-dcb4-4346-b3fa-9fefeb46636f) > > [max_micro_sequences.txt](https://github.com/user-attachments/files/19465321/max_micro_sequences.txt) > > Do you suggest going with the proposed performant sequence to fix this bug and addressing any shortcoming after more experimintation later? Thanks for investigating this, let us take it up in a separate PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2014797608 From sviswanathan at openjdk.org Wed Mar 26 18:40:11 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 26 Mar 2025 18:40:11 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v4] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 11:21:02 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 50: > 48: > 49: @Test > 50: @IR(counts = {IRNode.MAX_HF, " >0 "}, applyIfCPUFeatureAnd = {"avx512_fp16", "true", "avx512bw", "true"}) Missing avx512vl check here and other places below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2014800161 From kvn at openjdk.org Wed Mar 26 18:48:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Mar 2025 18:48:13 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:21:50 GMT, Christian Hagedorn wrote: > _Note: The actual fix is only ~80 changed lines - everything else is about tests._ > > After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. > > For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). > > ### Maintain Assertion Predicates when Splitting a Loop > When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). > > #### Current State > Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: > > | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | > | ------------------------ | --------------------------------------- | --------------------------------------- | > | Create Main Loop | ? | ? | > | Create Post Loop | ? | ? | > | Loop Unswitching | ? | _not required, same init, stride and, limit_ | > | Loop Unrolling | ? | ? | > | Range Check Elimination | ? | ? | > | Loop Peeling | ? | ? | > | Splitting Main Loop | ? | ? | > > Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. > > #### Fixing Unsupported Cases > This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: > - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. > - Remove the restriction that we only clone Template Assertion Predicate ... This simplifies the code. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24246#pullrequestreview-2718305992 From kvn at openjdk.org Wed Mar 26 18:52:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Mar 2025 18:52:07 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 17:37:58 GMT, Zdenek Zambersky wrote: > This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. Can we run some of them with Graal? When no C2 specific flags are used. ------------- PR Review: https://git.openjdk.org/jdk/pull/24262#pullrequestreview-2718316754 From kvn at openjdk.org Wed Mar 26 18:57:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Mar 2025 18:57:06 GMT Subject: RFR: 8352948: Remove leftover runtime_x86_32.cpp after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:31:57 GMT, Aleksey Shipilev wrote: > As I merged [JDK-8345169](https://bugs.openjdk.org/browse/JDK-8345169), I noticed one more file was left over: > > > $ find | grep x86_32 > ./src/hotspot/cpu/x86/runtime_x86_32.cpp > > > My bad, this PR removes that leftover. Good. You got 2 approvals. I don't think you need to wait 24 hours for this. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24249#pullrequestreview-2718329258 PR Comment: https://git.openjdk.org/jdk/pull/24249#issuecomment-2755460702 From jbhateja at openjdk.org Wed Mar 26 19:24:51 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 26 Mar 2025 19:24:51 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24169/files - new: https://git.openjdk.org/jdk/pull/24169/files/ec578e57..5bc21b99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=03-04 Stats: 8 lines in 1 file changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24169/head:pull/24169 PR: https://git.openjdk.org/jdk/pull/24169 From kvn at openjdk.org Wed Mar 26 19:33:12 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Mar 2025 19:33:12 GMT Subject: RFR: 8350471: Unhandled compilation bailout in GraphKit::builtin_throw In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 08:55:10 GMT, Manuel H?ssig wrote: > # Issue Summary > > When creating a builtin exception node, a stress test decided to bail out as if the allocation of the builtin exception objects had failed. Since these are preallocated at VM creation, the test failure is a false positive. > > # Change Rationale > > `GraphKit::builtin_throw()` features a bailout check after getting an appropriate exception object. However, up to that point, the execution in `builtin_throw()` cannot fail. In particular, there can be no failure to allocate the exception because these are all preallocated during `Threads::create_vm()` startup in `universe_post_init()` and `Threads:initialize_java_lang_classes()`. Further, none of the three callers handles a possible bailout in `builtin_throw()`. Hence, this PR removes the bailout check responsible for the test failure > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078715650) > - tier1 through tier3 and Oracle internal testing So where failed state come from? ------------- PR Review: https://git.openjdk.org/jdk/pull/24243#pullrequestreview-2718413407 From shade at openjdk.org Wed Mar 26 19:48:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 19:48:18 GMT Subject: RFR: 8352948: Remove leftover runtime_x86_32.cpp after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:31:57 GMT, Aleksey Shipilev wrote: > As I merged [JDK-8345169](https://bugs.openjdk.org/browse/JDK-8345169), I noticed one more file was left over: > > > $ find | grep x86_32 > ./src/hotspot/cpu/x86/runtime_x86_32.cpp > > > My bad, this PR removes that leftover. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24249#issuecomment-2755582754 From shade at openjdk.org Wed Mar 26 19:48:20 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 19:48:20 GMT Subject: Integrated: 8352948: Remove leftover runtime_x86_32.cpp after 32-bit x86 removal In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:31:57 GMT, Aleksey Shipilev wrote: > As I merged [JDK-8345169](https://bugs.openjdk.org/browse/JDK-8345169), I noticed one more file was left over: > > > $ find | grep x86_32 > ./src/hotspot/cpu/x86/runtime_x86_32.cpp > > > My bad, this PR removes that leftover. This pull request has now been integrated. Changeset: e83cccfe Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e83cccfed4463ddbec5493722355d65c4eb41646 Stats: 332 lines in 1 file changed: 0 ins; 332 del; 0 mod 8352948: Remove leftover runtime_x86_32.cpp after 32-bit x86 removal Reviewed-by: stefank, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24249 From jbhateja at openjdk.org Wed Mar 26 20:00:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 26 Mar 2025 20:00:01 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v7] In-Reply-To: References: Message-ID: > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding tests for new float16 Generator ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22755/files - new: https://git.openjdk.org/jdk/pull/22755/files/d5da1405..ce3abe77 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=05-06 Stats: 129 lines in 8 files changed: 116 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From dlong at openjdk.org Wed Mar 26 20:52:09 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 26 Mar 2025 20:52:09 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 08:34:35 GMT, Matthias Baesken wrote: > > Yes, we should fix frequency calculation in a separate follow up > > Do you want me to open a JBS issue for this? Yes, please. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23962#issuecomment-2755713667 From dlong at openjdk.org Wed Mar 26 20:52:10 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 26 Mar 2025 20:52:10 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 08:33:56 GMT, Matthias Baesken wrote: > Okay so not INT_MAX but 100 ; should I do it from both `from_pct ` and `to_pct ` ? I think to_pct is enough. I don't think it can happen on from_pct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23962#issuecomment-2755715792 From jbhateja at openjdk.org Wed Mar 26 21:18:32 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 26 Mar 2025 21:18:32 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v8] In-Reply-To: References: Message-ID: > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Some re-factoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22755/files - new: https://git.openjdk.org/jdk/pull/22755/files/ce3abe77..6f89f3f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=06-07 Stats: 9 lines in 4 files changed: 0 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From sviswanathan at openjdk.org Wed Mar 26 22:02:07 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 26 Mar 2025 22:02:07 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: <0JWo3XA9uKQe19BygMwL80FvNu-OM9Uv79YbFlUwQf8=.a730d809-20cb-4caf-93ac-d4b3f14de02d@github.com> On Wed, 26 Mar 2025 19:24:51 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24169#pullrequestreview-2718706615 From vlivanov at openjdk.org Wed Mar 26 22:11:08 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 26 Mar 2025 22:11:08 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v3] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 14:00:27 GMT, Hannes Greule wrote: >> subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. >> >> For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: >> - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway >> - For other values, the msb is flipped, flipping twice results in the original value again. >> >> Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. >> >> One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like >> >> int v = 0; >> for (int datum : data) { >> v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); >> } >> return v; >> >> was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > add tests for ReverseBytesS/ReverseBytesUS Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23851#pullrequestreview-2718720573 From vlivanov at openjdk.org Wed Mar 26 23:31:11 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 26 Mar 2025 23:31:11 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 03:30:36 GMT, Quan Anh Mai wrote: >> Thanks for the pointers, @merykitty. >> >> First of all, all aforementioned PRs/RFEs focus on new functionality. Any experiments migrating existing use cases (in particular, final graph reshaping and post-loop opts GVN ones)? >> >> I see one reference to a PR dependent on proposed logic, so I'll comment on it ([PR #22886](https://github.com/openjdk/jdk/pull/22886)): >> * It looks strange to see such transformations happening in x86-specific code. Are other platforms expected to reimplement it one by one? (I'd expect to see expansion logic in shared code guarded by `Matcher::match_rule_supported_vector()`. And `VectorCastNode` looks like the best place for it.) >> * How much does it benefit from a full-blown GVN? For example, there's already some basic redundancy elimination happening during final graph reshaping. Will it be enough here? >> >> Overall, I'm still not convinced that the proposed patch (as it is shaped now) is the right way to go. What I'm looking for is more experimental data on the usage patterns where lowering takes place (new functionality is fine, but I'm primarily interested in migrating existing use cases). >> >> So far, I see 2 types of scenarios either benefitting from delayed GVN transformations (post-loop opts GVN transformations, macro node lowering, GC barriers expansion) or requiring ad-hoc plaftorm-specific IR tweaks to simplify matching (happening during final graph reshaping). But It's still an open question to me what is the best way to cover ad-hoc platform-specific transformations on Ideal graph you seem to care about the most. >> >> From maintenance perspective, it would help a lot to be able to easily share code across multiple ports while keeping ad-hoc platform-specific transformations close to the place where their results are consumed (in AD files). > >> First of all, all aforementioned PRs/RFEs focus on new functionality. > > I don't know where you get this impression from. Most of the aforementioned PRs/RFEs are existing transformations, we just do it elsewhere. > > #22922 is currently done in idealization in a clumsy manner, it would be better to do it with the consideration of the underlying hardware, since it is the entire purpose of that transformation. > >> Some examples that I have given regarding vector insertion and vector extraction. > > This is done during code emission, which does not benefit from common expression elimination > > https://bugs.openjdk.org/browse/JDK-8345812 is currently done during parsing, it would be easier for the autovectorizer to use the node if it wants to if we do the transformation later. > > For existing use cases, you can find a lot of them scattering around: > > - Transformation of `MulNode` to `LShiftNode` that we have covered above. > > - `CMoveNode` tries to push 0 to the right because on x86, making a constant 0 kills the flag register, and `cmov` is a 2-address instruction that kills the first input. > > - `final_graph_reshaping_impl` tries to swap the inputs of some nodes because on x86, these are 2-address instructions that kill the first input. > > - There are some transformations in `final_graph_reshaping_main_switch` that are guarded with `Matcher`, if we move them to lowering we can skip these queries. > > - A lot of use cases you can find in code emission (a.k.a. x86.ad). It makes sense, because everything you can do during lowering can be done during code emission, just in a less efficient manner. At this point you also have the most knowledge and can transform the instructions arbitrarily without worrying about other architectures. Some notable examples: min/max are expanded into compare and cmov, reverse short is implemented by reserse int and a right shift, `Conv2B` is just compare with 0 and setcc, a lot of vector nodes, etc. > >> I see one reference to a PR dependent on proposed logic, so I'll comment on it (https://github.com/openjdk/jdk/pull/22886): > > For the first question, the reason I believe is that it is not always possible to extract and insert elements into a vector efficiently. On x86 it takes maximum 2 instructions to extract a vector element and 3 instructions to insert an element into a vector. > > For the second question, without lowering the cost is miserable, if you are unpacking and packing a vector of 4 longs: > > // unpacking > movq rax, xmm0 > vpextrq rcx, xm... @merykitty it feels to me our discussion has been going around in circles. This PR proposes a new way to perform IR lowering. So far, I see [#22886](https://github.com/openjdk/jdk/pull/22886) which illustrates its intended usage. Any other examples? >> I see one reference to a PR dependent on proposed logic, so I'll comment on it (https://github.com/openjdk/jdk/pull/22886): > For the first question, the reason I believe is that it is not always possible to extract and insert elements into a vector efficiently. The primary reason why `VectorCastL2[FD]`/`VectorCastD2[IL]` aren't supported yet is because there's no proper hardware support available on x86 until AVX512DQ. So, instead of handcoding a naive version by hand, the patch proposes to implement it by expanding corresponding nodes into a series of scalar operations. From Vector API perspective, it's still a huge win since it eliminates vector boxing/unboxing. Such transformation is inherently platform-agnostic, so putting such code in platform-specific files doesn't look right to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2755994338 From liach at openjdk.org Wed Mar 26 23:37:13 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 26 Mar 2025 23:37:13 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v44] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 18:29:34 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > Undo accidental changes to Int tests @iwanowww Can you help review this, which causes expand and compress with constant mask not constant fold on aarch (x64 uses instructions) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2756000381 From qamai at openjdk.org Thu Mar 27 00:28:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 27 Mar 2025 00:28:14 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 23:28:51 GMT, Vladimir Ivanov wrote: >>> First of all, all aforementioned PRs/RFEs focus on new functionality. >> >> I don't know where you get this impression from. Most of the aforementioned PRs/RFEs are existing transformations, we just do it elsewhere. >> >> #22922 is currently done in idealization in a clumsy manner, it would be better to do it with the consideration of the underlying hardware, since it is the entire purpose of that transformation. >> >>> Some examples that I have given regarding vector insertion and vector extraction. >> >> This is done during code emission, which does not benefit from common expression elimination >> >> https://bugs.openjdk.org/browse/JDK-8345812 is currently done during parsing, it would be easier for the autovectorizer to use the node if it wants to if we do the transformation later. >> >> For existing use cases, you can find a lot of them scattering around: >> >> - Transformation of `MulNode` to `LShiftNode` that we have covered above. >> >> - `CMoveNode` tries to push 0 to the right because on x86, making a constant 0 kills the flag register, and `cmov` is a 2-address instruction that kills the first input. >> >> - `final_graph_reshaping_impl` tries to swap the inputs of some nodes because on x86, these are 2-address instructions that kill the first input. >> >> - There are some transformations in `final_graph_reshaping_main_switch` that are guarded with `Matcher`, if we move them to lowering we can skip these queries. >> >> - A lot of use cases you can find in code emission (a.k.a. x86.ad). It makes sense, because everything you can do during lowering can be done during code emission, just in a less efficient manner. At this point you also have the most knowledge and can transform the instructions arbitrarily without worrying about other architectures. Some notable examples: min/max are expanded into compare and cmov, reverse short is implemented by reserse int and a right shift, `Conv2B` is just compare with 0 and setcc, a lot of vector nodes, etc. >> >>> I see one reference to a PR dependent on proposed logic, so I'll comment on it (https://github.com/openjdk/jdk/pull/22886): >> >> For the first question, the reason I believe is that it is not always possible to extract and insert elements into a vector efficiently. On x86 it takes maximum 2 instructions to extract a vector element and 3 instructions to insert an element into a vector. >> >> For the second question, without lowering the cost is miserable, if you are unpacking and packing a vector of 4 longs: >>... > > @merykitty it feels to me our discussion has been going around in circles. > > This PR proposes a new way to perform IR lowering. So far, I see [#22886](https://github.com/openjdk/jdk/pull/22886) which illustrates its intended usage. Any other examples? > >>> I see one reference to a PR dependent on proposed logic, so I'll comment on it (https://github.com/openjdk/jdk/pull/22886): >> For the first question, the reason I believe is that it is not always possible to extract and insert elements into a vector efficiently. > > The primary reason why `VectorCastL2[FD]`/`VectorCastD2[IL]` aren't supported yet is because there's no proper hardware support available on x86 until AVX512DQ. So, instead of handcoding a naive version by hand, the patch proposes to implement it by expanding corresponding nodes into a series of scalar operations. From Vector API perspective, it's still a huge win since it eliminates vector boxing/unboxing. Such transformation is inherently platform-agnostic, so putting such code in platform-specific files doesn't look right to me. @iwanowww I struggle to understand what are you expecting right now. How can there be examples other than you having to imagine from my words if we don't currently have the tool? Do you have any alternative idea to solve the issue of platform-dependent lowering that benefits from GVN? In particular, how do you propose to solve the puzzle of transforming this set of Java code into this set of instructions? //packing LongVector v; v1 = v.lane(0); v2 = v.lane(1); v3 = v.lane(2); v4 = v.lane(3); // unpacking LongVector v = LongVector.zero(LongVector.SPECIES_256) v = v.withLane(0, v1); v = v.withLane(1, v2); v = v.withLane(2, v3); v = v.withLane(3, v4); // unpacking movq rax, xmm0 vpextrq rcx, xmm0, 1 vextracti128 xmm1, ymm0, 1 movq rdx, xmm1 vpextrq rbx, xmm1, 1 // packing vmovq xmm0, rax vinsrq xmm0, xmm0, rcx, 1 vmovq xmm1, rdx vinsrq xmm1, xmm1, rbx, 1 vinserti128 ymm0, ymm0, xmm1, 1 > From Vector API perspective, it's still a huge win since it eliminates vector boxing/unboxing Not if the cost of extracting and inserting elements is large since we are doing a lot of them here. And even if we can do it in all platforms, I don't see why we can't start with one architecture and expand the transformation to the others later. The function that does the transformation can be put in an arch-independent file that is called from lowering in an arch-dependent file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2756056351 From vlivanov at openjdk.org Thu Mar 27 01:01:32 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 27 Mar 2025 01:01:32 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 00:25:14 GMT, Quan Anh Mai wrote: > I struggle to understand what are you expecting right now. I'm encouraging some experiments to justify proposed design. Do you suggest to take the patch for granted and then think through all the consequences later? I hope not. >> From Vector API perspective, it's still a huge win since it eliminates vector boxing/unboxing > Not if the cost of extracting and inserting elements is large since we are doing a lot of them here. It's still cheaper than boxing/unboxing and iterating over in-memory representation. (FTR 512-bit vector of longs/doubles has 8 elements.) > And even if we can do it in all platforms, I don't see why we can't start with one architecture and expand the transformation to the others later. The function that does the transformation can be put in an arch-independent file that is called from lowering in an arch-dependent file. Simply because there are better ways to solve that particular problem. (IMO post-loop opts IGVN and `Matcher::match_rule_supported_vector()` enable cleaner implementation.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2756108334 PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2756110229 From duke at openjdk.org Thu Mar 27 03:18:21 2025 From: duke at openjdk.org (Anjian-Wen) Date: Thu, 27 Mar 2025 03:18:21 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory In-Reply-To: <7jVhKpVfHqZRix25pTXA28NA2RW2OKuWH1XR-TIfpLw=.ccd76cbd-7d73-4df6-8f54-0512a1d0cae9@github.com> References: <7jVhKpVfHqZRix25pTXA28NA2RW2OKuWH1XR-TIfpLw=.ccd76cbd-7d73-4df6-8f54-0512a1d0cae9@github.com> Message-ID: On Wed, 26 Mar 2025 16:34:58 GMT, Martin Doerr wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about 15%-20% unsafe setmemory time > > You may want to use `UnsafeMemoryAccessMark` as on x86. @TheRealMDoerr Thanks for your kindly reply. I found that the main logic on the x86 is in 'generate_unsafe_setmemory' function, while the main logic on the riscv and aarch64 is in generate_fill. I have not found 'UnsafeMemoryAccess' on aarch64 in generate_fill, I will check whether we need to add it and where to insert it if needed on riscv. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2756440074 From swen at openjdk.org Thu Mar 27 05:20:10 2025 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 27 Mar 2025 05:20:10 GMT Subject: RFR: 8352316: More MergeStoreBench In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: On Wed, 26 Mar 2025 01:57:23 GMT, Shaojin Wen wrote: >> Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works > > I'm a developer of fastjson2. According to third-party benchmarks from https://github.com/fabienrenaud/java-json-benchmark, our library demonstrates the best performance. I would like to contribute some of these optimization techniques to OpenJDK, ideally by having C2 (the JIT compiler) directly support them. > > Below is an example related to this PR. We have a JavaBean that needs to be serialized to a JSON string: > > > * JavaBean > > class Bean { > public int value; > } > > > * Target JSON Output > > {"value":123} > > > * CodeGen-Generated JSONSerializer > fastjson2 uses ASM to generate a serializer class like the following. The methods writeNameValue0, writeNameValue1, and writeNameValue2 are candidate implementations. Among them, writeNameValue2 is the fastest when the field name length is 8, as it leverages UNSAFE.putLong for direct memory operations: > > > class BeanJSONSerializer { > private static final String name = ""value":"; > private static final byte[] nameBytes = name.getBytes(); > private satic final long nameLong = UNSAFE.getLong(nameBytes, ARRAY_BYTE_BASE_OFFSET); > > int writeNameValue0(byte[] bytes, int off, int value) { > name.getBytes(0, 8, bytes, off); > off += 8; > return writeInt32(bytes, off, value); > } > > int writeNameValue1(byte[] bytes, int off, int value) { > System.arraycopy(nameBytes, 0, bytes, off, 8); > off += 8; > return writeInt32(bytes, off, value); > } > > > int writeNameValue2(byte[] bytes, int off, int value) { > UNSAFE.putLong(bytes, ARRAY_BYTE_BASE_OFFSET + off, nameLong); > off += 8; > return writeInt32(bytes, off, value); > } > } > > > We propose that the C2 compiler could optimize cases where the field name length is 4 or 8 bytes by automatically using direct memory operations similar to writeNameValue2. This would eliminate the need for manual unsafe operations in user code and improve serialization performance for common patterns. > @wenshao Do you have any insight from this benchmark? What was your motivation for it? > > I also wonder if an IR test for some of the cases would be helpful. IR tests give us more info about what the compiler produced, and if there is a change in VM behaviour the IR test catches it in regular testing. Benchmarks are not run regularly, and regressions would therefore not be caught. I submitted this benchmark to prove that the performance of System.arraycopy or String.getBytes can be improved by Unsafe.putInt/putLong. I hope C2 can do this optimization automatically. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2756725833 From galder at openjdk.org Thu Mar 27 05:22:33 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 27 Mar 2025 05:22:33 GMT Subject: RFR: 8344942: Template-Based Testing Framework In-Reply-To: References: Message-ID: <_ISX3hSaQuWOvkh8KUsOS69y_aDB6JZGSWsVT1DWq4k=.de29649c-00f1-4e84-9a46-75ef89e8e30a@github.com> On Tue, 25 Mar 2025 08:31:36 GMT, Emanuel Peter wrote: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework` > > **History** > @TobiHartmann and I have played with code generators for a while, and have had the dream of doing that in a more principled way. And to hopefully... Looks great though I'm not too familiar with the code to be able to do a reasonable review, but I had a question: Have you got any practical use case that can show where you've used this and show what it takes to build such a use case? `VectorReduction2` or similar type of microbenchmarks would be great to see auto generated using this? The reason I ask this is because I feel that something that is missing in this PR is a small practical use case where this framework is put into action to actually generate some jtreg/IR/microbenchmark test and see how it runs as part of the CI in the PR. WDYT? ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2719765351 From chagedorn at openjdk.org Thu Mar 27 05:58:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Mar 2025 05:58:11 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops In-Reply-To: References: Message-ID: <_pdL-_hwaTL3JqdPQom-TQ4CJ5oouKw0piWZNDJsRuA=.e4b3080b-9c10-4d68-a473-3a1cf654d89c@github.com> On Wed, 26 Mar 2025 09:21:50 GMT, Christian Hagedorn wrote: > _Note: The actual fix is only ~80 changed lines - everything else is about tests._ > > After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. > > For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). > > ### Maintain Assertion Predicates when Splitting a Loop > When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). > > #### Current State > Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: > > | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | > | ------------------------ | --------------------------------------- | --------------------------------------- | > | Create Main Loop | ? | ? | > | Create Post Loop | ? | ? | > | Loop Unswitching | ? | _not required, same init, stride and, limit_ | > | Loop Unrolling | ? | ? | > | Range Check Elimination | ? | ? | > | Loop Peeling | ? | ? | > | Splitting Main Loop | ? | ? | > > Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. > > #### Fixing Unsupported Cases > This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: > - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. > - Remove the restriction that we only clone Template Assertion Predicate ... Indeed, it does :-) Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24246#issuecomment-2756797943 From kvn at openjdk.org Thu Mar 27 06:24:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 27 Mar 2025 06:24:07 GMT Subject: RFR: 8352426: RelocIterator should correctly handle nullptr address of relocation data In-Reply-To: References: Message-ID: <2KI1B9dDNLlPF7pYQFINo9-F_TehTbbJ_D-sJP4CZmE=.6f9f748f-1933-49b3-b897-f8f786d77e7c@github.com> On Mon, 24 Mar 2025 17:04:04 GMT, Boris Ulasevich wrote: > This is a follow-up to the recent #24102 patch. It addresses an issue where RelocIterator may receive a nullptr as the relocation table address. This change can also serve as an independent fix for JDK-8352112. > > RelocIterator::initialize() and RelocIterator::next() perform decrement/increment operations on an internal relocaction pointer. > If nm->relocation_begin() returns nullptr, this results in undefined behavior, as pointer arithmetic on nullptr is prohibited by the C++ Standard. > > Instead of introducing a null-check (which would add overhead in RelocIterator::next(), a performance-sensitive path), we initialize _current with a dummy static variable. This pointer is never dereferenced, so its actual value is not important - it just serves to avoid undefined behavior. > > RelocIterator::RelocIterator constructor can initialize _current pointer as well. However, in that place we have an assert to ensure that nullptr value is not allowed, and it seems we do not need to apply dummy value there. > > Testing: > > The fix has been verified against the failure in JDK-8352112. The issue no longer reproduces with this patch, regardless of whether the original fix from #24102 is applied. My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24203#pullrequestreview-2719932274 From rraj at openjdk.org Thu Mar 27 06:37:54 2025 From: rraj at openjdk.org (Rohit Arul Raj) Date: Thu, 27 Mar 2025 06:37:54 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v3] In-Reply-To: References: Message-ID: > In JDK-8309130, Array sort was optimized using AVX512 SIMD instructions for x86_64. Currently, this optimization has been disabled for AMD Zen 4 [JDK-8317763] due to bad performance of compressstoreu. > Ref: https://www.reddit.com/r/java/comments/171t5sj/heads_up_openjdk_implementation_of_avx512_based/. > > This patch enables Zen 4 to pick optimized AVX2 version of SIMD sort and Zen 5 picks the AVX512 version. > > JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. > > Attaching ArraySort performance data for Zen4 & Zen5. > [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) > [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: Refactor 'supports_avx512_simd_sort' code to make it easily readable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24053/files - new: https://git.openjdk.org/jdk/pull/24053/files/42011911..b369de6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24053&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24053&range=01-02 Stats: 9 lines in 1 file changed: 7 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24053/head:pull/24053 PR: https://git.openjdk.org/jdk/pull/24053 From epeter at openjdk.org Thu Mar 27 06:54:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Mar 2025 06:54:07 GMT Subject: RFR: 8344942: Template-Based Testing Framework In-Reply-To: <_ISX3hSaQuWOvkh8KUsOS69y_aDB6JZGSWsVT1DWq4k=.de29649c-00f1-4e84-9a46-75ef89e8e30a@github.com> References: <_ISX3hSaQuWOvkh8KUsOS69y_aDB6JZGSWsVT1DWq4k=.de29649c-00f1-4e84-9a46-75ef89e8e30a@github.com> Message-ID: <0wlSWVjicgOUdlPgHckGklmEWfmZEVeE91BDjgJugRo=.eaf220fa-61ac-4e48-b535-35a2bb5c55c3@github.com> On Thu, 27 Mar 2025 05:19:43 GMT, Galder Zamarre?o wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want to generate the `javadocs`: >> `javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework` >> >> **History** >> @TobiHartmann and I have played with code generators for a while, and have had ... > > Looks great though I'm not too familiar with the code to be able to do a reasonable review, but I had a question: > > Have you got any practical use case that can show where you've used this and show what it takes to build such a use case? `VectorReduction2` or similar type of microbenchmarks would be great to see auto generated using this? > > The reason I ask this is because I feel that something that is missing in this PR is a small practical use case where this framework is put into action to actually generate some jtreg/IR/microbenchmark test and see how it runs as part of the CI in the PR. WDYT? @galderz Thanks for your questions! > Looks great though I'm not too familiar with the code to be able to do a reasonable review Well the code is all brand new, so really anybody could review ;) > Have you got any practical use case that can show where you've used this and show what it takes to build such a use case? I actually have a list of experiments in this branch (it is linked in the PR description): https://github.com/openjdk/jdk/pull/23418 Some of them use the IR framework, though for now just as a testing harness, not for IR rules. Generating IR rules automatically requires quite a bit of logic... I hope that is satisfactory for now? Ah, but there was this test: `test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java` I did now not refactor it, but it would not be too hard to see how to use the Templates for it. And I do generate IR rules in that one. I don't super like just refactoring old tests... there is always a risk of breaking it and then coverage is worse than before... > The reason I ask this is because I feel that something that is missing in this PR is a small practical use case where this framework is put into action to actually generate some jtreg/IR/microbenchmark test and see how it runs as part of the CI in the PR. WDYT? I also have a few tests in this PR that just generate regular JTREG tests, without the IR framework, did you see those? > VectorReduction2 or similar type of microbenchmarks would be great to see auto generated using this? I don't yet have a solution for microbenchmarks. It's mostly an issue of including the `test/hotspot/jtreg/compiler/lib` path... And I fear that JMH requires all benchmark code to be compiled beforehand, and not dynamically as I do with the class loader. But maybe there is a solution for that. The patch is already quite large, and so I wanted to just publish the basic framework. Do you think that is ok? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2756920241 From epeter at openjdk.org Thu Mar 27 07:15:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Mar 2025 07:15:07 GMT Subject: RFR: 8352316: More MergeStoreBench In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: <9opjjavobIQboajylvMxhxsqmPThAoj1UHA7zLblegk=.3901885e-535f-4023-bb23-e63fe36d2d20@github.com> On Thu, 27 Mar 2025 05:17:44 GMT, Shaojin Wen wrote: >> I'm a developer of fastjson2. According to third-party benchmarks from https://github.com/fabienrenaud/java-json-benchmark, our library demonstrates the best performance. I would like to contribute some of these optimization techniques to OpenJDK, ideally by having C2 (the JIT compiler) directly support them. >> >> Below is an example related to this PR. We have a JavaBean that needs to be serialized to a JSON string: >> >> >> * JavaBean >> >> class Bean { >> public int value; >> } >> >> >> * Target JSON Output >> >> {"value":123} >> >> >> * CodeGen-Generated JSONSerializer >> fastjson2 uses ASM to generate a serializer class like the following. The methods writeNameValue0, writeNameValue1, and writeNameValue2 are candidate implementations. Among them, writeNameValue2 is the fastest when the field name length is 8, as it leverages UNSAFE.putLong for direct memory operations: >> >> >> class BeanJSONSerializer { >> private static final String name = ""value":"; >> private static final byte[] nameBytes = name.getBytes(); >> private satic final long nameLong = UNSAFE.getLong(nameBytes, ARRAY_BYTE_BASE_OFFSET); >> >> int writeNameValue0(byte[] bytes, int off, int value) { >> name.getBytes(0, 8, bytes, off); >> off += 8; >> return writeInt32(bytes, off, value); >> } >> >> int writeNameValue1(byte[] bytes, int off, int value) { >> System.arraycopy(nameBytes, 0, bytes, off, 8); >> off += 8; >> return writeInt32(bytes, off, value); >> } >> >> >> int writeNameValue2(byte[] bytes, int off, int value) { >> UNSAFE.putLong(bytes, ARRAY_BYTE_BASE_OFFSET + off, nameLong); >> off += 8; >> return writeInt32(bytes, off, value); >> } >> } >> >> >> We propose that the C2 compiler could optimize cases where the field name length is 4 or 8 bytes by automatically using direct memory operations similar to writeNameValue2. This would eliminate the need for manual unsafe operations in user code and improve serialization performance for common patterns. > >> @wenshao Do you have any insight from this benchmark? What was your motivation for it? >> >> I also wonder if an IR test for some of the cases would be helpful. IR tests give us more info about what the compiler produced, and if there is a change in VM behaviour the IR test catches it in regular testing. Benchmarks are not run regularly, and regressions would therefore not be caught. > > I submitted this benchmark to prove that the performance of System.arraycopy or String.getBytes can be improved by Unsafe.putInt/putLong. I hope C2 can do this optimization automatically. @wenshao > I hope C2 can do this optimization automatically. Did you check if it does or does not do that? Can you investigate what the generated code is for `String.getBytes`? Does that not create an allocation, which would make things much slower? And it may even do some more complicated encoding things, which is a lot of overhead. So that would explain your performance result, at least partially, right? I'm also not convinced that you are comparing apples to apples here. Benchmark Mode Cnt Score Error Units MergeStoreBench.putNull_arraycopy avgt 5 8029.622 ? 60.856 ns/op This does an array copy, so an array load AND an array store, right? This one even has to do allocations, loads and stores (though you need to investigate and check): MergeStoreBench.putNull_getBytes avgt 5 6171.538 ? 5.845 ns/op On the other hand, this does NOT have to do an array load or allocations, just a simple store: MergeStoreBench.putNull_unsafePutInt avgt 5 235.302 ? 2.004 ns/op Is there actually a benchmark in this series that makes use of individual byte stores that get merged to an int store? Because that is the whole point of MergeStores, right? Do you really need to use `String.getBytes`? I mean maybe with proper escape analysis etc the whole allocation could be avoided. But that would require a much deeper analysis. Back to this: > I hope C2 can do this optimization automatically. Can you investigate what code it generates, and what kinds of optimizations are missing to make it close in performance to the `Unsafe` benchmark? I don't have time to do all the deep investigations myself. But feel free to ask me if you have more questions. @wenshao Since we don't seem to be comparing apples to apples here, it would be even more important to leave comments at the benchmarks to say what operations (loads, stores, allocations, etc) are happening. And what we know is optimized, and what we think could be optimized in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2756960435 PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2756965014 From mchevalier at openjdk.org Thu Mar 27 07:27:13 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 27 Mar 2025 07:27:13 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself In-Reply-To: References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> <4kcnJ_hXXskugWzn0x2cQNwH0cXpiftOQMTJlqdKKpI=.8aae37b8-2895-4cdc-b70e-c2f7cebed0b5@github.com> Message-ID: On Wed, 26 Mar 2025 09:34:49 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/flag/TestCompileCommandFileWriter.java line 85: >> >>> 83: private void check(Class testClass, boolean findIdeal, boolean findOpto, CompilePhase... compilePhases) throws IOException { >>> 84: var compilerDirectivesFlagBuilder = new CompilerDirectivesFlagBuilder(testClass); >>> 85: compilerDirectivesFlagBuilder.build(); >> >> I was tempted to write >> >> new CompilerDirectivesFlagBuilder(testClass).build(); >> >> since we don't use `compilerDirectivesFlagBuilder` after. But I felt like the style might not be liked. Opinions on that? > > I guess it's fine to go with `new CompilerDirectivesFlagBuilder(testClass).build()`. If there is no strong preference, I'll just leave it as it is: having the variable doesn't hurt, as far as I know, and it helps slightly if one wants to debug, to make a breakpoint, to inspect the state of the object etc.. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24240#discussion_r2015817777 From rehn at openjdk.org Thu Mar 27 07:35:18 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 27 Mar 2025 07:35:18 GMT Subject: RFR: 8352897: RISC-V: Change default value for UseConservativeFence In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 01:57:55 GMT, Fei Yang wrote: >> Hi, please consider. >> >> gcc have stopped emitting io-bits for fences since 13. >> And we need to use newer gcc due to other compiler bugs. >> Therefore there is no point in letting JIT emit io-bits when the runtime don't have them. >> >> Thanks, Robbin > > Looks good. Thanks @RealFYang! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24233#issuecomment-2757004923 From rehn at openjdk.org Thu Mar 27 07:35:18 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 27 Mar 2025 07:35:18 GMT Subject: Integrated: 8352897: RISC-V: Change default value for UseConservativeFence In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 15:49:40 GMT, Robbin Ehn wrote: > Hi, please consider. > > gcc have stopped emitting io-bits for fences since 13. > And we need to use newer gcc due to other compiler bugs. > Therefore there is no point in letting JIT emit io-bits when the runtime don't have them. > > Thanks, Robbin This pull request has now been integrated. Changeset: 10078111 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/10078111aff4e095276ceccd250a25851f33a2ab Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod 8352897: RISC-V: Change default value for UseConservativeFence Reviewed-by: luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/24233 From hgreule at openjdk.org Thu Mar 27 07:37:17 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 27 Mar 2025 07:37:17 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v3] In-Reply-To: References: Message-ID: <-PY-YRSgE1RuApCBZdobGdBKFSq8dX1vrZp1FCn_oTY=.e38a2ecd-0272-441b-9ff2-c8f8028eea9c@github.com> On Thu, 13 Mar 2025 14:00:27 GMT, Hannes Greule wrote: >> subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. >> >> For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: >> - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway >> - For other values, the msb is flipped, flipping twice results in the original value again. >> >> Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. >> >> One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like >> >> int v = 0; >> for (int datum : data) { >> v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); >> } >> return v; >> >> was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > add tests for ReverseBytesS/ReverseBytesUS Thank you both for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23851#issuecomment-2757009806 From duke at openjdk.org Thu Mar 27 07:37:18 2025 From: duke at openjdk.org (duke) Date: Thu, 27 Mar 2025 07:37:18 GMT Subject: RFR: 8350988: Consolidate Identity of self-inverse operations [v3] In-Reply-To: References: Message-ID: <5oMjDg7wqXYAvgSBaxPvldERuMH111fLUJEOuGBXa34=.cd822c0e-b7e6-45bb-8d46-a7206cc41953@github.com> On Thu, 13 Mar 2025 14:00:27 GMT, Hannes Greule wrote: >> subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. >> >> For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: >> - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway >> - For other values, the msb is flipped, flipping twice results in the original value again. >> >> Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. >> >> One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like >> >> int v = 0; >> for (int datum : data) { >> v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); >> } >> return v; >> >> was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > add tests for ReverseBytesS/ReverseBytesUS @SirYwell Your change (at version 0a48b5b859213ec38c4043ab9a7ab512bbf5ee0b) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23851#issuecomment-2757012125 From thartmann at openjdk.org Thu Mar 27 07:37:15 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 27 Mar 2025 07:37:15 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself In-Reply-To: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> Message-ID: <1TpXWSwQrWwSyf5Q22XvrLYBmxJNZCADTmsv143BvIU=.c65a029e-2a11-41b8-b139-dc458d8d9499@github.com> On Wed, 26 Mar 2025 07:36:37 GMT, Marc Chevalier wrote: > Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! > > I think @chhagedorn's eye would be the most relevant. > > Thanks, > Marc test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/flag/TestCompileCommandFileWriter.java line 47: > 45: * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox > 46: * @run junit/othervm -Xbootclasspath/a:. -DSkipWhiteBoxInstall=true -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions > 47: * -XX:+WhiteBoxAPI compiler.lib.ir_framework.flag.TestCompileCommandFileWriter Do we really need the WhiteBoxAPI? `-XX:+IgnoreUnrecognizedVMOptions` can be removed as well. Please also update the copyright date. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24240#discussion_r2015830209 From hgreule at openjdk.org Thu Mar 27 07:42:23 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 27 Mar 2025 07:42:23 GMT Subject: Integrated: 8350988: Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: On Sat, 1 Mar 2025 13:34:30 GMT, Hannes Greule wrote: > subnode has multiple nodes that are self-inverse but lacking the respective optimization. ReverseINode and ReverseLNode already have the optimization, but we can deduplicate the code for all those operations. > > For most nodes, the optimization is obvious. The NegF/DNodes however are worth to look at in detail imo: > - `Float.NaN` has the same bits set as `-Float.NaN`. That means, it this specific case, the operation is a no-op anyway > - For other values, the msb is flipped, flipping twice results in the original value again. > > Similar changes could be made to the corresponding vector nodes. If you want, I can do that in a follow-up RFE. > > One note: During benchmarking those changes, I ran into https://bugs.openjdk.org/browse/JDK-8307516. That means code like > > int v = 0; > for (int datum : data) { > v ^= Integer.reverseBytes(Integer.reverseBytes(datum)); > } > return v; > > was vectorized before but is considered "not profitable" with the changes here, causing slowdowns in such cases. This pull request has now been integrated. Changeset: 66b5dba6 Author: Hannes Greule Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/66b5dba690e7bd23054221cdc7f8394b0759876b Stats: 248 lines in 4 files changed: 223 ins; 8 del; 17 mod 8350988: Consolidate Identity of self-inverse operations Reviewed-by: epeter, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/23851 From duke at openjdk.org Thu Mar 27 08:02:17 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Mar 2025 08:02:17 GMT Subject: RFR: 8350471: Unhandled compilation bailout in GraphKit::builtin_throw In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 19:30:18 GMT, Vladimir Kozlov wrote: > So where failed state come from? The failure occurred in a stress test due to a stressing decision and the callers of `GraphKit::builtin_throw()`did not handle the failure. From my investigation, I concluded that a failing state at that point cannot be reached without a StressBailout. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24243#issuecomment-2757060282 From epeter at openjdk.org Thu Mar 27 08:03:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Mar 2025 08:03:12 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v2] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework` > > **History** > @TobiHartmann and I have played with code generators for a while, and have had the dream of doing that in a more principled way. And to hopefully... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 - fix tests - whitespace - whitespace - fix whitespace - JDK-8344942 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/a10d6fa0..ededf45b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=00-01 Stats: 38809 lines in 1314 files changed: 5216 ins; 31448 del; 2145 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From chagedorn at openjdk.org Thu Mar 27 08:09:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Mar 2025 08:09:55 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops [v2] In-Reply-To: References: Message-ID: > _Note: The actual fix is only ~80 changed lines - everything else is about tests._ > > After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. > > For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). > > ### Maintain Assertion Predicates when Splitting a Loop > When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). > > #### Current State > Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: > > | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | > | ------------------------ | --------------------------------------- | --------------------------------------- | > | Create Main Loop | ? | ? | > | Create Post Loop | ? | ? | > | Loop Unswitching | ? | _not required, same init, stride and, limit_ | > | Loop Unrolling | ? | ? | > | Range Check Elimination | ? | ? | > | Loop Peeling | ? | ? | > | Splitting Main Loop | ? | ? | > > Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. > > #### Fixing Unsupported Cases > This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: > - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. > - Remove the restriction that we only clone Template Assertion Predicate ... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Fixing test failure 8353019 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24246/files - new: https://git.openjdk.org/jdk/pull/24246/files/9d80d846..fb25c10c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24246&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24246&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24246/head:pull/24246 PR: https://git.openjdk.org/jdk/pull/24246 From epeter at openjdk.org Thu Mar 27 08:12:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Mar 2025 08:12:07 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 08:09:55 GMT, Christian Hagedorn wrote: >> _Note: The actual fix is only ~80 changed lines - everything else is about tests._ >> >> After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. >> >> For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). >> >> ### Maintain Assertion Predicates when Splitting a Loop >> When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). >> >> #### Current State >> Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: >> >> | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | >> | ------------------------ | --------------------------------------- | --------------------------------------- | >> | Create Main Loop | ? | ? | >> | Create Post Loop | ? | ? | >> | Loop Unswitching | ? | _not required, same init, stride and, limit_ | >> | Loop Unrolling | ? | ? | >> | Range Check Elimination | ? | ? | >> | Loop Peeling | ? | ? | >> | Splitting Main Loop | ? | ? | >> >> Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. >> >> #### Fixing Unsupported Cases >> This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: >> - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. >> - Remove the rest... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fixing test failure 8353019 @chhagedorn Amazing work! This really took a lot of patience and persistence, but you did it! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24246#pullrequestreview-2720225610 From chagedorn at openjdk.org Thu Mar 27 08:16:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Mar 2025 08:16:12 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 08:09:55 GMT, Christian Hagedorn wrote: >> _Note: The actual fix is only ~80 changed lines - everything else is about tests._ >> >> After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. >> >> For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). >> >> ### Maintain Assertion Predicates when Splitting a Loop >> When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). >> >> #### Current State >> Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: >> >> | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | >> | ------------------------ | --------------------------------------- | --------------------------------------- | >> | Create Main Loop | ? | ? | >> | Create Post Loop | ? | ? | >> | Loop Unswitching | ? | _not required, same init, stride and, limit_ | >> | Loop Unrolling | ? | ? | >> | Range Check Elimination | ? | ? | >> | Loop Peeling | ? | ? | >> | Splitting Main Loop | ? | ? | >> >> Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. >> >> #### Fixing Unsupported Cases >> This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: >> - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. >> - Remove the rest... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fixing test failure 8353019 There was a test failure report ([JDK-8353019](https://bugs.openjdk.org/browse/JDK-8353019)) about the "no flag" run adds with the previous PR (https://github.com/openjdk/jdk/pull/23823). The "no flag" run also added `-XX:+AbortVMOnCompilationFailure` which failed in a higher tier running with `-XX:+VerifyOops` due to a bailout in C1 when compiling `java.lang.classfile.Opcode::`. It's the very same problem also observed and described here: https://github.com/openjdk/jdk/pull/7214 For this test, I'm suggesting to just remove `XX:+AbortVMOnCompilationFailure` to make this a real "no flag" run. Since I'm touching this test now anyways, I've just squeezed the fix in here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24246#issuecomment-2757090875 From chagedorn at openjdk.org Thu Mar 27 08:26:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Mar 2025 08:26:26 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 08:09:55 GMT, Christian Hagedorn wrote: >> _Note: The actual fix is only ~80 changed lines - everything else is about tests._ >> >> After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. >> >> For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). >> >> ### Maintain Assertion Predicates when Splitting a Loop >> When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). >> >> #### Current State >> Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: >> >> | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | >> | ------------------------ | --------------------------------------- | --------------------------------------- | >> | Create Main Loop | ? | ? | >> | Create Post Loop | ? | ? | >> | Loop Unswitching | ? | _not required, same init, stride and, limit_ | >> | Loop Unrolling | ? | ? | >> | Range Check Elimination | ? | ? | >> | Loop Peeling | ? | ? | >> | Splitting Main Loop | ? | ? | >> >> Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. >> >> #### Fixing Unsupported Cases >> This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: >> - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. >> - Remove the rest... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fixing test failure 8353019 I will give this some more intensive testing before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24246#issuecomment-2757160314 From chagedorn at openjdk.org Thu Mar 27 08:26:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Mar 2025 08:26:26 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 08:13:07 GMT, Christian Hagedorn wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixing test failure 8353019 > > There was a test failure report ([JDK-8353019](https://bugs.openjdk.org/browse/JDK-8353019)) about the "no flag" run adds with the previous PR (https://github.com/openjdk/jdk/pull/23823). The "no flag" run also added `-XX:+AbortVMOnCompilationFailure` which failed in a higher tier running with `-XX:+VerifyOops` due to a bailout in C1 when compiling `java.lang.classfile.Opcode::`. It's the very same problem also observed and described here: https://github.com/openjdk/jdk/pull/7214 > > For this test, I'm suggesting to just remove `XX:+AbortVMOnCompilationFailure` to make this a real "no flag" run. Since I'm touching this test now anyways, I've just squeezed the fix in here. > @chhagedorn Amazing work! This really took a lot of patience and persistence, but you did it! Thanks Emanuel! :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24246#issuecomment-2757156785 From mchevalier at openjdk.org Thu Mar 27 08:48:51 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 27 Mar 2025 08:48:51 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself [v2] In-Reply-To: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> Message-ID: > Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! > > I think @chhagedorn's eye would be the most relevant. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: fix copyright and remove IgnoreUnrecognizedVMOptions and remove vm.debug ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24240/files - new: https://git.openjdk.org/jdk/pull/24240/files/321ebab8..437a4913 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24240&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24240&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24240.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24240/head:pull/24240 PR: https://git.openjdk.org/jdk/pull/24240 From mchevalier at openjdk.org Thu Mar 27 08:48:51 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 27 Mar 2025 08:48:51 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself In-Reply-To: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> Message-ID: On Wed, 26 Mar 2025 07:36:37 GMT, Marc Chevalier wrote: > Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! > > I think @chhagedorn's eye would be the most relevant. > > Thanks, > Marc Fixed as requested. Also, vm.debug == true seems unnecessary. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24240#issuecomment-2757209965 From mchevalier at openjdk.org Thu Mar 27 08:48:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 27 Mar 2025 08:48:52 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself [v2] In-Reply-To: <1TpXWSwQrWwSyf5Q22XvrLYBmxJNZCADTmsv143BvIU=.c65a029e-2a11-41b8-b139-dc458d8d9499@github.com> References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> <1TpXWSwQrWwSyf5Q22XvrLYBmxJNZCADTmsv143BvIU=.c65a029e-2a11-41b8-b139-dc458d8d9499@github.com> Message-ID: On Thu, 27 Mar 2025 07:34:19 GMT, Tobias Hartmann wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> fix copyright and remove IgnoreUnrecognizedVMOptions and remove vm.debug > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/flag/TestCompileCommandFileWriter.java line 47: > >> 45: * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox >> 46: * @run junit/othervm -Xbootclasspath/a:. -DSkipWhiteBoxInstall=true -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions >> 47: * -XX:+WhiteBoxAPI compiler.lib.ir_framework.flag.TestCompileCommandFileWriter > > Do we really need the WhiteBoxAPI? `-XX:+IgnoreUnrecognizedVMOptions` can be removed as well. Please also update the copyright date. We do need the WhiteBoxAPI. IgnoreUnrecognizedVMOptions removed, date updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24240#discussion_r2015967264 From mbaesken at openjdk.org Thu Mar 27 09:02:07 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Mar 2025 09:02:07 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 13:37:23 GMT, Matthias Baesken wrote: > When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test > java/foreign/TestHandshake > this error/warning is reported : > > jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' > UndefinedBehaviorSanitizer:DEADLYSIGNAL > UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. > > Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); I created https://bugs.openjdk.org/browse/JDK-8353041 Adjust block frequency calculation for the follow up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23962#issuecomment-2757243222 From chagedorn at openjdk.org Thu Mar 27 09:13:22 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Mar 2025 09:13:22 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself [v2] In-Reply-To: References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> Message-ID: On Thu, 27 Mar 2025 08:48:51 GMT, Marc Chevalier wrote: >> Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! >> >> I think @chhagedorn's eye would be the most relevant. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright and remove IgnoreUnrecognizedVMOptions and remove vm.debug Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24240#pullrequestreview-2720440711 From duke at openjdk.org Thu Mar 27 09:24:01 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Mar 2025 09:24:01 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v4] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:27:39 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from @chhagedorn > > Co-authored-by: Christian Hagedorn Implemented suggestions, merged master and reran testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2757292302 From duke at openjdk.org Thu Mar 27 09:24:00 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Mar 2025 09:24:00 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: References: Message-ID: > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into JDK-8347449-loop-predicate - Improve help text for UseProfiledLoopPredicate argument - loopnode: cleaner control flow - Clean up IR test - Apply suggestions from @chhagedorn Co-authored-by: Christian Hagedorn - ir-framework: rename new nodes to convention - ir-framework: fix phase for parse predicate nodes - Make conditions on UseProfiledLoopPredicate first test UseLoopPredicate - Turn off UseProfiledLoopPredicate when UseLoopPredicate is turned off - Add regression IR test - ... and 1 more: https://git.openjdk.org/jdk/compare/d9538d7f...72ebfc8e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24248/files - new: https://git.openjdk.org/jdk/pull/24248/files/ea653995..72ebfc8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=03-04 Stats: 30579 lines in 68 files changed: 463 ins; 29885 del; 231 mod Patch: https://git.openjdk.org/jdk/pull/24248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24248/head:pull/24248 PR: https://git.openjdk.org/jdk/pull/24248 From duke at openjdk.org Thu Mar 27 09:24:05 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Mar 2025 09:24:05 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v3] In-Reply-To: References: <5OOpDW693XhGVePrz6zYlm9gMZKneFaIfl6BJP-NQb0=.c5757e84-6b14-442b-b822-d0b1eeb5f913@github.com> Message-ID: On Wed, 26 Mar 2025 15:04:15 GMT, Christian Hagedorn wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> ir-framework: rename new nodes to convention > > src/hotspot/share/opto/c2_globals.hpp line 789: > >> 787: product(bool, UseProfiledLoopPredicate, true, \ >> 788: "Move predicates out of loops based on profiling data. " \ >> 789: "Requires UseLoopPredicate to be turned on (default).") \ > > It was already a bit vague before but I suggest to be more precise that we move checks with an uncommon trap out of a loop (and the resulting check before the loop is then a predicate): > > Move checks with an uncommon trap out of loops based on profiling data. Requires [...] Implemented in [f903729](https://github.com/openjdk/jdk/pull/24248/commits/f90372927a4d7ed82740014934f4409648d42bca) > src/hotspot/share/opto/loopnode.cpp line 4304: > >> 4302: tty->print(" profile_predicated"); >> 4303: } >> 4304: if (UseLoopPredicate && predicates.loop_predicate_block()->is_non_empty()) { > > Maybe you can merge these blocks: > > if (UseLoopPredicate) { > if (UseProfiledLoopPredicate && predicates.profiled_loop_predicate_block()->is_non_empty()) { > tty->print(" profile_predicated"); > } > if (predicates.loop_predicate_block()->is_non_empty()) { > tty->print(" predicated"); > } > } Implemented in [1aba1c2](https://github.com/openjdk/jdk/pull/24248/commits/1aba1c23f0e90e0a6717bdf7c441451b8e9c3efc) > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1524: > >> 1522: public static final String PARSE_PREDICATE_LOOP = PREFIX + "PARSE_PREDICATE_LOOP" + POSTFIX; >> 1523: static { >> 1524: parsePredicateNodes(PARSE_PREDICATE_LOOP, "Loop"); > > I suggest the following names found in `predicates.hpp`: > https://github.com/openjdk/jdk/blob/79bffe2f28f90986d45f4e91efc021290b4fc00a/src/hotspot/share/opto/predicates.hpp#L48-L50 Implemented in [7f24c87](https://github.com/openjdk/jdk/pull/24248/commits/7f24c87557da33bbb96a3596222b4737a06d9d31) > test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 41: > >> 39: static final int WARMUP = 10_000; >> 40: static final int SIZE = 100; >> 41: static final int min = 3; > > Since `min` is also a constant, you should capitalize it. Implemented in [7f24c87](https://github.com/openjdk/jdk/pull/24248/commits/7f24c87557da33bbb96a3596222b4737a06d9d31) > test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 46: > >> 44: TestFramework.runWithFlags("-XX:+UseLoopPredicate", >> 45: "-XX:+UseProfiledLoopPredicate"); >> 46: TestFramework.runWithFlags("-XX:-UseLoopPredicate"); > > You could also add a run where you only disable `-XX:-UseProfiledLoopPredicate` for completeness and add an IR rule accordingly. Implemented in [7f24c87](https://github.com/openjdk/jdk/pull/24248/commits/7f24c87557da33bbb96a3596222b4737a06d9d31) > test/hotspot/jtreg/compiler/predicates/TestDisabledLoopPredicates.java line 74: > >> 72: @IR(counts = { IRNode.PARSE_PREDICATE_LOOP, "=1", >> 73: IRNode.PARSE_PREDICATE_PROFILED_LOOP, "1" }, >> 74: phase = CompilePhase.AFTER_PARSING, > > `phase` is not required since you've decided that `AFTER_PARSING` is the default phase where we match this node on. You only need to specify `phase` if you want to match on a different phase. Implemented in [7f24c87](https://github.com/openjdk/jdk/pull/24248/commits/7f24c87557da33bbb96a3596222b4737a06d9d31) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016037388 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016040418 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016039828 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016039142 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016037918 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016038370 From mbaesken at openjdk.org Thu Mar 27 09:27:58 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Mar 2025 09:27:58 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' [v2] In-Reply-To: References: Message-ID: > When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test > java/foreign/TestHandshake > this error/warning is reported : > > jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' > UndefinedBehaviorSanitizer:DEADLYSIGNAL > UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. > > Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: calculate from_pct like we did before, clamp to_pct ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23962/files - new: https://git.openjdk.org/jdk/pull/23962/files/f8f63019..bf358c08 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23962&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23962&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23962.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23962/head:pull/23962 PR: https://git.openjdk.org/jdk/pull/23962 From chagedorn at openjdk.org Thu Mar 27 10:08:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Mar 2025 10:08:15 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 09:24:00 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8347449-loop-predicate > - Improve help text for UseProfiledLoopPredicate argument > - loopnode: cleaner control flow > - Clean up IR test > - Apply suggestions from @chhagedorn > > Co-authored-by: Christian Hagedorn > - ir-framework: rename new nodes to convention > - ir-framework: fix phase for parse predicate nodes > - Make conditions on UseProfiledLoopPredicate first test UseLoopPredicate > - Turn off UseProfiledLoopPredicate when UseLoopPredicate is turned off > - Add regression IR test > - ... and 1 more: https://git.openjdk.org/jdk/compare/4130165c...72ebfc8e Thanks for the updates, looks good now! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2720657747 From thartmann at openjdk.org Thu Mar 27 10:18:15 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 27 Mar 2025 10:18:15 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself [v2] In-Reply-To: References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> Message-ID: On Thu, 27 Mar 2025 08:48:51 GMT, Marc Chevalier wrote: >> Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! >> >> I think @chhagedorn's eye would be the most relevant. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright and remove IgnoreUnrecognizedVMOptions and remove vm.debug Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24240#pullrequestreview-2720687799 From mdoerr at openjdk.org Thu Mar 27 10:24:12 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Mar 2025 10:24:12 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory In-Reply-To: <7jVhKpVfHqZRix25pTXA28NA2RW2OKuWH1XR-TIfpLw=.ccd76cbd-7d73-4df6-8f54-0512a1d0cae9@github.com> References: <7jVhKpVfHqZRix25pTXA28NA2RW2OKuWH1XR-TIfpLw=.ccd76cbd-7d73-4df6-8f54-0512a1d0cae9@github.com> Message-ID: On Wed, 26 Mar 2025 16:34:58 GMT, Martin Doerr wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about 15%-20% unsafe setmemory time > > You may want to use `UnsafeMemoryAccessMark` as on x86. > @TheRealMDoerr Thanks for your kindly reply. I found that the main logic on the x86 is in 'generate_unsafe_setmemory' function, while the main logic on the riscv and aarch64 is in generate_fill. I have not found 'UnsafeMemoryAccess' on aarch64 in generate_fill, I will check whether we need to add it and where to insert it if needed on riscv. Note that generate_fill is normally not used for Unsafe accesses and hence doesn't need 'UnsafeMemoryAccessMark'. I'm using it a bit differently here: https://github.com/openjdk/jdk/pull/24254 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2757508131 From duke at openjdk.org Thu Mar 27 10:30:14 2025 From: duke at openjdk.org (duke) Date: Thu, 27 Mar 2025 10:30:14 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself [v2] In-Reply-To: References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> Message-ID: On Thu, 27 Mar 2025 08:48:51 GMT, Marc Chevalier wrote: >> Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! >> >> I think @chhagedorn's eye would be the most relevant. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright and remove IgnoreUnrecognizedVMOptions and remove vm.debug @marc-chevalier Your change (at version 437a49134f78adefffe35c05d222f199c4bef2f5) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24240#issuecomment-2757526350 From mchevalier at openjdk.org Thu Mar 27 10:30:13 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 27 Mar 2025 10:30:13 GMT Subject: RFR: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself [v2] In-Reply-To: References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> Message-ID: On Thu, 27 Mar 2025 08:48:51 GMT, Marc Chevalier wrote: >> Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! >> >> I think @chhagedorn's eye would be the most relevant. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright and remove IgnoreUnrecognizedVMOptions and remove vm.debug Thanks @chhagedorn and @TobiHartmann ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24240#issuecomment-2757525138 From duke at openjdk.org Thu Mar 27 10:38:11 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 27 Mar 2025 10:38:11 GMT Subject: RFR: 8350471: Unhandled compilation bailout in GraphKit::builtin_throw In-Reply-To: References: Message-ID: <7N-LFT3DkRztiuRhJsGwmHq63ddMhDciOLmORymCjss=.83666ddd-4c6e-4698-8d1a-c63fa8bc2030@github.com> On Thu, 27 Mar 2025 07:59:10 GMT, Manuel H?ssig wrote: > So where failed state come from? Maybe in a bit more detail: The failing test ran with `-XX:+StressBailout`, which makes "stress decisions" to exercise the `failing()` calls throughout C2. In our case, the removed line in `GraphKit::builtin_throw()` was randomly selected to bail out for testing purposes _without_ a failing state. Concretely, in the `Compile::failing()` code path, the execution entered `Compile::fail_randomly()` due to `StressBailout == true`: https://github.com/openjdk/jdk/blob/4100dc9d4cdd5f0c202b2b2a32554e3aa4f15025/src/hotspot/share/opto/compile.hpp#L811-L822 In `fail_randomly()` the random number generator doomed the execution to fail: https://github.com/openjdk/jdk/blob/4100dc9d4cdd5f0c202b2b2a32554e3aa4f15025/src/hotspot/share/opto/compile.cpp#L4998-L5004 We know that it was a stress bailout from the logs: Pending compilation failure details for thread 0x00007f2638217ad0: Time: 138.483687 seconds (0d 0h 2m 18s) Compile id: 24732 Reason: 'StressBailout' Knowing this and finding that there cannot be a failure state from the execution in `GraphKit::builtin_throw()`, I concluded that this test failure is a false positive. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24243#issuecomment-2757546229 From bulasevich at openjdk.org Thu Mar 27 11:34:16 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 27 Mar 2025 11:34:16 GMT Subject: RFR: 8352426: RelocIterator should correctly handle nullptr address of relocation data In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 17:04:04 GMT, Boris Ulasevich wrote: > This is a follow-up to the recent #24102 patch. It addresses an issue where RelocIterator may receive a nullptr as the relocation table address. This change can also serve as an independent fix for JDK-8352112. > > RelocIterator::initialize() and RelocIterator::next() perform decrement/increment operations on an internal relocaction pointer. > If nm->relocation_begin() returns nullptr, this results in undefined behavior, as pointer arithmetic on nullptr is prohibited by the C++ Standard. > > Instead of introducing a null-check (which would add overhead in RelocIterator::next(), a performance-sensitive path), we initialize _current with a dummy static variable. This pointer is never dereferenced, so its actual value is not important - it just serves to avoid undefined behavior. > > RelocIterator::RelocIterator constructor can initialize _current pointer as well. However, in that place we have an assert to ensure that nullptr value is not allowed, and it seems we do not need to apply dummy value there. > > Testing: > > The fix has been verified against the failure in JDK-8352112. The issue no longer reproduces with this patch, regardless of whether the original fix from #24102 is applied. Good. Thanks for testing. And thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24203#issuecomment-2757708815 From bulasevich at openjdk.org Thu Mar 27 11:34:17 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 27 Mar 2025 11:34:17 GMT Subject: Integrated: 8352426: RelocIterator should correctly handle nullptr address of relocation data In-Reply-To: References: Message-ID: <7gLqCqOmxaPRF7rsWlcfKSnObyONHonz5TNRvw7KvWk=.1b951ed2-afa2-486b-ad06-44f9ddbec96d@github.com> On Mon, 24 Mar 2025 17:04:04 GMT, Boris Ulasevich wrote: > This is a follow-up to the recent #24102 patch. It addresses an issue where RelocIterator may receive a nullptr as the relocation table address. This change can also serve as an independent fix for JDK-8352112. > > RelocIterator::initialize() and RelocIterator::next() perform decrement/increment operations on an internal relocaction pointer. > If nm->relocation_begin() returns nullptr, this results in undefined behavior, as pointer arithmetic on nullptr is prohibited by the C++ Standard. > > Instead of introducing a null-check (which would add overhead in RelocIterator::next(), a performance-sensitive path), we initialize _current with a dummy static variable. This pointer is never dereferenced, so its actual value is not important - it just serves to avoid undefined behavior. > > RelocIterator::RelocIterator constructor can initialize _current pointer as well. However, in that place we have an assert to ensure that nullptr value is not allowed, and it seems we do not need to apply dummy value there. > > Testing: > > The fix has been verified against the failure in JDK-8352112. The issue no longer reproduces with this patch, regardless of whether the original fix from #24102 is applied. This pull request has now been integrated. Changeset: 0bfa636c Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/0bfa636c7f43e31c53c6bae6ee859131bd45229f Stats: 12 lines in 1 file changed: 9 ins; 0 del; 3 mod 8352426: RelocIterator should correctly handle nullptr address of relocation data Reviewed-by: dlong, vlivanov, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24203 From mchevalier at openjdk.org Thu Mar 27 11:39:18 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 27 Mar 2025 11:39:18 GMT Subject: Integrated: 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself In-Reply-To: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> References: <9NvSt_uPGn16s9oD-l2nlAxGXq8Ghz1wDfRE7iS9cIw=.4987fdf2-2922-416a-a275-01d31aee354c@github.com> Message-ID: On Wed, 26 Mar 2025 07:36:37 GMT, Marc Chevalier wrote: > Simply changing the path in `@run` was not enough (see JBS for details). And actually, the test wasn't doing anything before trying to open an output file to check the result. From various hints, I completed the test: I hope it was the intent! > > I think @chhagedorn's eye would be the most relevant. > > Thanks, > Marc This pull request has now been integrated. Changeset: 927aeb2f Author: Marc Chevalier Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/927aeb2feeacddfb7267e4d211134f061a2566e4 Stats: 13 lines in 1 file changed: 2 ins; 0 del; 11 mod 8352617: IR framework test TestCompileCommandFileWriter.java runs TestCompilePhaseCollector instead of itself Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24240 From jbhateja at openjdk.org Thu Mar 27 12:05:16 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 27 Mar 2025 12:05:16 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: <6X59Ix7zz_LlScbSDJoOTKQE86zcsk3m3kYG1fZGPPw=.919d75c7-14b5-4887-a198-29c0af290118@github.com> On Wed, 26 Mar 2025 19:24:51 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Hi @eme64 , Can you kindly verify your review resolution and approve if your validation runs are all green? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2757790101 From shade at openjdk.org Thu Mar 27 12:30:19 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 12:30:19 GMT Subject: RFR: 8351156: C1: Remove FPU stack support after 32-bit x86 removal Message-ID: C1 has the 32-bit x86 specific code that supports x87 FPU stack allocations. With 32-bit x86 port removed, we can clean up those parts. 64-bit x86 does not need x87 FPU, since it is baselined on SSE2 and using XMM registers instead. There are lots of deeper cleanups possible, this PR focuses on removing the x87 FPU allocation/uses in C1. Note current C1 nomenclature is confusing. On all arches, "FPU" means floating-point _registers_. That is _except_ on x86, where "FPU" means x87 FPU stack, and "XMM" means floating-point registers. This is why we only touch "FPU" code on other architectures only in a light manner. After all 32-bit x86 cleanups land, we might consider renaming "XMM" -> "FPU" in x86 C1 to match the common nomenclature. Brief tour of changes: - FPU stack simulator is not needed anymore, so I removed it and the related infrastructure - Lots of 32-bit specific paths that touch x87 FPU registers are pruned - Related LIR nodes like `lir_fxch`, `lir_fld`, `lir_fpop_raw` are pruned - Simplified the API that is no longer needed, e.g. dropping `pop_fpu_stack` This PR would likely conflict with some in-flight cleanups, so it would require merges later. Take a look meanwhile. Additional testing: - [x] Linux x86_64 server fastdebug, `tier1` - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` - [ ] Linux x86_64 server fastdebug, `all` - [ ] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` ------------- Commit messages: - Fixing build failures after reg2stack - Remove remaining FPU uses in LIRAssembler_x86 - Touchups - Initial fix Changes: https://git.openjdk.org/jdk/pull/24274/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24274&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351156 Stats: 2585 lines in 39 files changed: 0 ins; 2534 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/24274.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24274/head:pull/24274 PR: https://git.openjdk.org/jdk/pull/24274 From epeter at openjdk.org Thu Mar 27 13:17:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Mar 2025 13:17:16 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 19:24:51 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution I have not looked at the x64 instructions, but only the tests again. I have noticed that you only cover specific values. You could improve tests with this: - Add non-canonical NaN values. - Just iterate over all possible Float16 input pairs. It's onls `2^32`, that should be feasible! Then compare compiled vs interpreted results. It seems that bugs like these happen because somehow we do not systematically cover all inputs. Maybe we should do the same for all Float16 operations? test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 57: > 55: @Run(test = "testMaxNaNOperands") > 56: public void launchMaxNaNOperands() { > 57: RES = testMaxNaNOperands(SRC, Float16.NaN); You are not only using the "canonical" `NaN` in the tests. Are there not other encodings? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24169#pullrequestreview-2721398291 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2016545317 From epeter at openjdk.org Thu Mar 27 14:07:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Mar 2025 14:07:20 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 09:24:00 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8347449-loop-predicate > - Improve help text for UseProfiledLoopPredicate argument > - loopnode: cleaner control flow > - Clean up IR test > - Apply suggestions from @chhagedorn > > Co-authored-by: Christian Hagedorn > - ir-framework: rename new nodes to convention > - ir-framework: fix phase for parse predicate nodes > - Make conditions on UseProfiledLoopPredicate first test UseLoopPredicate > - Turn off UseProfiledLoopPredicate when UseLoopPredicate is turned off > - Add regression IR test > - ... and 1 more: https://git.openjdk.org/jdk/compare/412d134a...72ebfc8e Thanks for working on this! You could also clean up the `IdealKit::loop`, which checks `UseLoopPredicate` only to call `add_parse_predicates`, which adds all predicates... and so it constrains too many things now. src/hotspot/share/opto/c2_globals.hpp line 790: > 788: "Move checks with an uncommon trap out of loops based on " \ > 789: "profiling data. " \ > 790: "Requires UseLoopPredicate to be turned on (default).") \ Can you also update the comment for `UseLoopPredicate`? It seems outdated / wrong. Now is: `Generate a predicate to select fast/slow loop versions` @chhagedorn do you have a good suggestion for what to put now? ------------- PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2721665164 PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016743798 From chagedorn at openjdk.org Thu Mar 27 14:30:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 27 Mar 2025 14:30:17 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: References: Message-ID: <01JvdutO8geXQM1nMA6lw-SeC-bNIiApSPykfxnDZls=.3973b39b-7a36-44fa-8c13-91c02268c986@github.com> On Thu, 27 Mar 2025 14:01:47 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8347449-loop-predicate >> - Improve help text for UseProfiledLoopPredicate argument >> - loopnode: cleaner control flow >> - Clean up IR test >> - Apply suggestions from @chhagedorn >> >> Co-authored-by: Christian Hagedorn >> - ir-framework: rename new nodes to convention >> - ir-framework: fix phase for parse predicate nodes >> - Make conditions on UseProfiledLoopPredicate first test UseLoopPredicate >> - Turn off UseProfiledLoopPredicate when UseLoopPredicate is turned off >> - Add regression IR test >> - ... and 1 more: https://git.openjdk.org/jdk/compare/44c209f7...72ebfc8e > > src/hotspot/share/opto/c2_globals.hpp line 790: > >> 788: "Move checks with an uncommon trap out of loops based on " \ >> 789: "profiling data. " \ >> 790: "Requires UseLoopPredicate to be turned on (default).") \ > > Can you also update the comment for `UseLoopPredicate`? It seems outdated / wrong. > > Now is: > `Generate a predicate to select fast/slow loop versions` > > @chhagedorn do you have a good suggestion for what to put now? Good catch! It could be similar but without mentioning profiling data: Move checks with an uncommon trap out of loops. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2016804557 From duke at openjdk.org Thu Mar 27 14:31:35 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 27 Mar 2025 14:31:35 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:47:47 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with two additional commits since the last revision: > > - num_8b_elems_in_vec --> nof_vec_elems > - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2758265834 From thartmann at openjdk.org Thu Mar 27 15:36:18 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 27 Mar 2025 15:36:18 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces In-Reply-To: References: Message-ID: <-Ri4lJUzCkI9yLG-kGwTGeAhd453SDgt_qvoB1iw4_A=.f3e126ab-a4ff-4f7f-80a7-c6e739cc6727@github.com> On Wed, 26 Mar 2025 09:16:17 GMT, Marc Chevalier wrote: > If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. > > In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. > > This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. > > The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. > > Tested with tier1..3, hs-precheckin-comp and hs-comp-stress > > Thanks, > Marc @rwestrel Should have a look at this :) Please add an IR framework test that verifies that layout helper checks are optimized. src/hotspot/share/opto/type.cpp line 3684: > 3682: } > 3683: > 3684: bool TypeInterfaces::has_non_array_interface() const { What about using `TypeAryPtr::_array_interfaces->contains(_interfaces);` instead? ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24245#pullrequestreview-2722219539 PR Review Comment: https://git.openjdk.org/jdk/pull/24245#discussion_r2016955402 From kvn at openjdk.org Thu Mar 27 16:38:12 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 27 Mar 2025 16:38:12 GMT Subject: RFR: 8351156: C1: Remove FPU stack support after 32-bit x86 removal In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 10:18:24 GMT, Aleksey Shipilev wrote: > C1 has the 32-bit x86 specific code that supports x87 FPU stack allocations. With 32-bit x86 port removed, we can clean up those parts. 64-bit x86 does not need x87 FPU, since it is baselined on SSE2 and using XMM registers instead. > > There are lots of deeper cleanups possible, this PR focuses on removing the x87 FPU allocation/uses in C1. Note current C1 nomenclature is confusing. On all arches, "FPU" means floating-point _registers_. That is _except_ on x86, where "FPU" means x87 FPU stack, and "XMM" means floating-point registers. This is why we only touch "FPU" code on other architectures only in a light manner. After all 32-bit x86 cleanups land, we might consider renaming "XMM" -> "FPU" in x86 C1 to match the common nomenclature. > > Brief tour of changes: > - FPU stack simulator is not needed anymore, so I removed it and the related infrastructure > - Lots of 32-bit specific paths that touch x87 FPU registers are pruned > - Related LIR nodes like `lir_fxch`, `lir_fld`, `lir_fpop_raw` are pruned > - Simplified the API that is no longer needed, e.g. dropping `pop_fpu_stack` > > This PR would likely conflict with some in-flight cleanups, so it would require merges later. Take a look meanwhile. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` > - [ ] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > On all arches, "FPU" means floating-point registers. Which is wrong I think. It should be "FPR". May be file RFE to rename `LIR_Opr::is_*_fpu()` methods to `LIR_Opr::is_*_fpr()` src/hotspot/share/c1/c1_LinearScan.cpp line 2662: > 2660: > 2661: } else if (opr->is_single_fpu()) { > 2662: #if defined(AMD64) Which platform uses this code? May be check #if for it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24274#issuecomment-2758680360 PR Review Comment: https://git.openjdk.org/jdk/pull/24274#discussion_r2017073229 From kvn at openjdk.org Thu Mar 27 16:42:20 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 27 Mar 2025 16:42:20 GMT Subject: RFR: 8350471: Unhandled compilation bailout in GraphKit::builtin_throw In-Reply-To: References: Message-ID: <-57H_7bTFarXorcgDvGzJmjhyY1SreDfCpVMT2l5-Uo=.05e3d66a-0e82-4d65-a37a-cfb4adac50c3@github.com> On Wed, 26 Mar 2025 08:55:10 GMT, Manuel H?ssig wrote: > # Issue Summary > > When creating a builtin exception node, a stress test decided to bail out as if the allocation of the builtin exception objects had failed. Since these are preallocated at VM creation, the test failure is a false positive. > > # Change Rationale > > `GraphKit::builtin_throw()` features a bailout check after getting an appropriate exception object. However, up to that point, the execution in `builtin_throw()` cannot fail. In particular, there can be no failure to allocate the exception because these are all preallocated during `Threads::create_vm()` startup in `universe_post_init()` and `Threads:initialize_java_lang_classes()`. Further, none of the three callers handles a possible bailout in `builtin_throw()`. Hence, this PR removes the bailout check responsible for the test failure > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078715650) > - tier1 through tier3 and Oracle internal testing Thank you for answer to my question. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24243#pullrequestreview-2722584771 From kvn at openjdk.org Thu Mar 27 16:45:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 27 Mar 2025 16:45:14 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' [v2] In-Reply-To: References: Message-ID: <7cs94Ld1-t-et2B8BL2g9oDWeK4EvbfSHM-w-yvx-NQ=.90d04264-9ab0-4b20-9325-ea426fe731df@github.com> On Thu, 27 Mar 2025 09:27:58 GMT, Matthias Baesken wrote: >> When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test >> java/foreign/TestHandshake >> this error/warning is reported : >> >> jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' >> UndefinedBehaviorSanitizer:DEADLYSIGNAL >> UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. >> >> Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > calculate from_pct like we did before, clamp to_pct Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23962#pullrequestreview-2722604829 From shade at openjdk.org Thu Mar 27 16:48:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 16:48:12 GMT Subject: RFR: 8351156: C1: Remove FPU stack support after 32-bit x86 removal In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 16:35:01 GMT, Vladimir Kozlov wrote: > Which is wrong I think. It should be "FPR". May be file RFE to rename `LIR_Opr::is_*_fpu()` methods to `LIR_Opr::is_*_fpr()` Yes, that would be more accurate. Remains to be seen how intrusive is this rename. > src/hotspot/share/c1/c1_LinearScan.cpp line 2662: > >> 2660: >> 2661: } else if (opr->is_single_fpu()) { >> 2662: #if defined(AMD64) > > Which platform uses this code? May be check #if for it. This is generic C1 linear scan. This assert is for x86_64: it is the same assert that we already have. I only translated `#elif` to `#if`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24274#issuecomment-2758720158 PR Review Comment: https://git.openjdk.org/jdk/pull/24274#discussion_r2017114008 From shade at openjdk.org Thu Mar 27 19:41:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 19:41:48 GMT Subject: RFR: 8351156: C1: Remove FPU stack support after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: > C1 has the 32-bit x86 specific code that supports x87 FPU stack allocations. With 32-bit x86 port removed, we can clean up those parts. 64-bit x86 does not need x87 FPU, since it is baselined on SSE2 and using XMM registers instead. > > There are lots of deeper cleanups possible, this PR focuses on removing the x87 FPU allocation/uses in C1. Note current C1 nomenclature is confusing. On all arches, "FPU" means floating-point _registers_. That is _except_ on x86, where "FPU" means x87 FPU stack, and "XMM" means floating-point registers. This is why we only touch "FPU" code on other architectures only in a light manner. After all 32-bit x86 cleanups land, we might consider renaming "XMM" -> "FPU" in x86 C1 to match the common nomenclature. > > Brief tour of changes: > - FPU stack simulator is not needed anymore, so I removed it and the related infrastructure > - Lots of 32-bit specific paths that touch x87 FPU registers are pruned > - Related LIR nodes like `lir_fxch`, `lir_fld`, `lir_fpop_raw` are pruned > - Simplified the API that is no longer needed, e.g. dropping `pop_fpu_stack` > > This PR would likely conflict with some in-flight cleanups, so it would require merges later. Take a look meanwhile. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8351156-x86-c1-fpustack - Fixing build failures after reg2stack - Remove remaining FPU uses in LIRAssembler_x86 - Touchups - Initial fix ------------- Changes: https://git.openjdk.org/jdk/pull/24274/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24274&range=01 Stats: 2555 lines in 39 files changed: 0 ins; 2508 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/24274.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24274/head:pull/24274 PR: https://git.openjdk.org/jdk/pull/24274 From dlong at openjdk.org Thu Mar 27 19:53:21 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 27 Mar 2025 19:53:21 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 09:27:58 GMT, Matthias Baesken wrote: >> When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test >> java/foreign/TestHandshake >> this error/warning is reported : >> >> jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' >> UndefinedBehaviorSanitizer:DEADLYSIGNAL >> UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. >> >> Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > calculate from_pct like we did before, clamp to_pct Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23962#pullrequestreview-2723235235 From vlivanov at openjdk.org Thu Mar 27 20:05:14 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 27 Mar 2025 20:05:14 GMT Subject: RFR: 8351156: C1: Remove FPU stack support after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 19:41:48 GMT, Aleksey Shipilev wrote: >> C1 has the 32-bit x86 specific code that supports x87 FPU stack allocations. With 32-bit x86 port removed, we can clean up those parts. 64-bit x86 does not need x87 FPU, since it is baselined on SSE2 and using XMM registers instead. >> >> There are lots of deeper cleanups possible, this PR focuses on removing the x87 FPU allocation/uses in C1. Note current C1 nomenclature is confusing. On all arches, "FPU" means floating-point _registers_. That is _except_ on x86, where "FPU" means x87 FPU stack, and "XMM" means floating-point registers. This is why we only touch "FPU" code on other architectures only in a light manner. After all 32-bit x86 cleanups land, we might consider renaming "XMM" -> "FPU" in x86 C1 to match the common nomenclature. >> >> Brief tour of changes: >> - FPU stack simulator is not needed anymore, so I removed it and the related infrastructure >> - Lots of 32-bit specific paths that touch x87 FPU registers are pruned >> - Related LIR nodes like `lir_fxch`, `lir_fld`, `lir_fpop_raw` are pruned >> - Simplified the API that is no longer needed, e.g. dropping `pop_fpu_stack` >> >> This PR would likely conflict with some in-flight cleanups, so it would require merges later. Take a look meanwhile. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8351156-x86-c1-fpustack > - Fixing build failures after reg2stack > - Remove remaining FPU uses in LIRAssembler_x86 > - Touchups > - Initial fix Looks good. src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 586: > 584: > 585: case T_FLOAT: { > 586: if (dest->is_single_xmm()) { Time to turn it into an assert? src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 600: > 598: > 599: case T_DOUBLE: { > 600: if (dest->is_double_xmm()) { Same here. An assert maybe? ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24274#pullrequestreview-2723223867 PR Review Comment: https://git.openjdk.org/jdk/pull/24274#discussion_r2017506683 PR Review Comment: https://git.openjdk.org/jdk/pull/24274#discussion_r2017507075 From shade at openjdk.org Thu Mar 27 20:10:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 27 Mar 2025 20:10:13 GMT Subject: RFR: 8351156: C1: Remove FPU stack support after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 19:44:57 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into JDK-8351156-x86-c1-fpustack >> - Fixing build failures after reg2stack >> - Remove remaining FPU uses in LIRAssembler_x86 >> - Touchups >> - Initial fix > > src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 586: > >> 584: >> 585: case T_FLOAT: { >> 586: if (dest->is_single_xmm()) { > > Time to turn it into an assert? I think the intent for `ShouldNotReachHere()`-s in C1 code is to crash out on a gross error, instead of silently miscompiling. I would not be 100% sure src type and dst reg type are always correct. It is basically `guarantee` in disguise, AFAIU. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24274#discussion_r2017533616 From duke at openjdk.org Fri Mar 28 00:18:41 2025 From: duke at openjdk.org (Mohamed Issa) Date: Fri, 28 Mar 2025 00:18:41 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. > > 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. > 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 20**. This new endpoint is the closest value above the minimum (**55 * ln(2) / 2**) required for correctness that's possible when only retrieving the topmost word of the input register. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v24-b33](https://github.com/openjdk/jdk/releases/tag/jdk-24%2B33) as the baseline version. > > For performance data collected with the regression micro-benchmark referenced in the bug report, see the table below. Each result is the mean of 3 individual runs. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly improve execution times to the point where are almost at parity with the baseline. Also, there is almost no impact to the low value (1, 2) scenarios. > > | Input range (+/-) | Baseline (ms) | No fix (ms) | With fix (ms) | No fix vs baseline (%) | Fix vs baseline (%) | > | :------------------: | :-------------: | :-----------: | :-------------: | :----------------------: | :-------------------: | > | 1 | 1842 | 1961 | 1969 | +6.46 | +6.89 | > | 2 | 2102 | 2010 | 1998 | -4.38 | -4.95 | > | 100 | 801 | 1018 | 716 | +27.09 | -10.61 | > | 1000 | 498 | 803 | 519 | +61.24 | +4.22 | > | 10000 | 474 | 755 | 491 | +59.28 | +3.59 | > | 100000 | 473 | 758 | 491 | +60.25 | +3.81 ... Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Change tanh intrinsic endpoint comparison value to match reference OpenJDK implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23889/files - new: https://git.openjdk.org/jdk/pull/23889/files/7addfd36..e563fd73 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=00-01 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23889/head:pull/23889 PR: https://git.openjdk.org/jdk/pull/23889 From sviswanathan at openjdk.org Fri Mar 28 00:18:41 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 28 Mar 2025 00:18:41 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 00:16:02 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 20**. This new endpoint is the closest value above the minimum (**55 * ln(2) / 2**) required for correctness that's possible when only retrieving the topmost word of the input register. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v24-b33](https://github.com/openjdk/jdk/releases/tag/jdk-24%2B33) as the baseline version. >> >> For performance data collected with the regression micro-benchmark referenced in the bug report, see the table below. Each result is the mean of 3 individual runs. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly improve execution times to the point where are almost at parity with the baseline. Also, there is almost no impact to the low value (1, 2) scenarios. >> >> | Input range (+/-) | Baseline (ms) | No fix (ms) | With fix (ms) | No fix vs baseline (%) | Fix vs baseline (%) | >> | :------------------: | :-------------: | :-----------: | :-------------: | :----------------------: | :-------------------: | >> | 1 | 1842 | 1961 | 1969 | +6.46 | +6.89 | >> | 2 | 2102 | 2010 | 1998 | -4.38 | -4.95 | >> | 100 | 801 | 1018 | 716 | +27.09 | -10.61 | >> | 1000 | 498 | 803 | 519 | +61.24 | +4.22 | >> | 10000 | 474 | 755 | 491 | +59.28 | +3.59 | >> | 100000 | 473 | 758 | 491 | +60.25 ... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Change tanh intrinsic endpoint comparison value to match reference OpenJDK implementation src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 332: > 330: __ andl(rcx, 32767); > 331: __ cmpl(rcx, 16436); > 332: __ jcc(Assembler::aboveEqual, L_2TAG_PACKET_2_0_1); // Branch only if |x| >= 20 It will be good to return +/- 1 when |x| >= 22 instead of 20 to match the Java code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2017745981 From duke at openjdk.org Fri Mar 28 00:56:23 2025 From: duke at openjdk.org (Mohamed Issa) Date: Fri, 28 Mar 2025 00:56:23 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 00:15:59 GMT, Sandhya Viswanathan wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Change tanh intrinsic endpoint comparison value to match reference OpenJDK implementation > > src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 332: > >> 330: __ andl(rcx, 32767); >> 331: __ cmpl(rcx, 16436); >> 332: __ jcc(Assembler::aboveEqual, L_2TAG_PACKET_2_0_1); // Branch only if |x| >= 20 > > It will be good to return +/- 1 when |x| >= 22 instead of 20 to match the Java code. Yes, I made the change. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2017767369 From duke at openjdk.org Fri Mar 28 02:06:19 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 28 Mar 2025 02:06:19 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory In-Reply-To: References: <7jVhKpVfHqZRix25pTXA28NA2RW2OKuWH1XR-TIfpLw=.ccd76cbd-7d73-4df6-8f54-0512a1d0cae9@github.com> Message-ID: On Thu, 27 Mar 2025 10:21:03 GMT, Martin Doerr wrote: >> You may want to use `UnsafeMemoryAccessMark` as on x86. > >> @TheRealMDoerr Thanks for your kindly reply. I found that the main logic on the x86 is in 'generate_unsafe_setmemory' function, while the main logic on the riscv and aarch64 is in generate_fill. I have not found 'UnsafeMemoryAccess' on aarch64 in generate_fill, I will check whether we need to add it and where to insert it if needed on riscv. > > Note that generate_fill is normally not used for Unsafe accesses and hence doesn't need 'UnsafeMemoryAccessMark'. I'm using it a bit differently here: https://github.com/openjdk/jdk/pull/24254 > > @TheRealMDoerr Thanks for your kindly reply. I found that the main logic on the x86 is in 'generate_unsafe_setmemory' function, while the main logic on the riscv and aarch64 is in generate_fill. I have not found 'UnsafeMemoryAccess' on aarch64 in generate_fill, I will check whether we need to add it and where to insert it if needed on riscv. > > Note that generate_fill is normally not used for Unsafe accesses and hence doesn't need 'UnsafeMemoryAccessMark'. I'm using it a bit differently here: #24254 That makes sense, I'll try learn from your way of adding 'UnsafeMemoryAccessMark' and then test it ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2759975260 From jbhateja at openjdk.org Fri Mar 28 04:50:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 28 Mar 2025 04:50:07 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v6] In-Reply-To: References: Message-ID: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding custom NaN generator ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24169/files - new: https://git.openjdk.org/jdk/pull/24169/files/5bc21b99..e2faec77 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=04-05 Stats: 61 lines in 1 file changed: 31 ins; 0 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/24169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24169/head:pull/24169 PR: https://git.openjdk.org/jdk/pull/24169 From jbhateja at openjdk.org Fri Mar 28 04:53:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 28 Mar 2025 04:53:17 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 13:14:39 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > I have not looked at the x64 instructions, but only the tests again. > > I have noticed that you only cover specific values. You could improve tests with this: > - Add non-canonical NaN values. > - Just iterate over all possible Float16 input pairs. It's onls `2^32`, that should be feasible! Then compare compiled vs interpreted results. > > It seems that bugs like these happen because somehow we do not systematically cover all inputs. Maybe we should do the same for all Float16 operations? Hi @eme64 , This specific issues is around special Float16 values i.e +/- 0.0 and NaN. I have added a Generator for Float16 as part of https://github.com/openjdk/jdk/pull/22755 Best Regards, Jatin > test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 57: > >> 55: @Run(test = "testMaxNaNOperands") >> 56: public void launchMaxNaNOperands() { >> 57: RES = testMaxNaNOperands(SRC, Float16.NaN); > > You are not only using the "canonical" `NaN` in the tests. Are there not other encodings? DONE I created a custom NaN generator to generate canonical NaN values. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2760176991 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2017912662 From jbhateja at openjdk.org Fri Mar 28 04:56:14 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 28 Mar 2025 04:56:14 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v6] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 07:56:58 GMT, Emanuel Peter wrote: > I looked at the changes in `Generators.java`, thanks for adding some code there ? > > Some comments on it: > > * You should add some Float16 tests to `test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java`. > * I am missing the "mixed distribution" function `float16s()`. As a reference, take `public Generator doubles()`. The idea is that we have a set of distributions, and we pick a random distribution every time in the tests. > * I'm also missing a "any bits" version, where you would take a random short value and reinterpret it as `Float16`. This ensures that we are getting all possible encodings, including multiple NaN encodings. > * All of this is probably enough code to make a separate PR. Hi @eme64 Your comments have been addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22755#issuecomment-2760191652 From galder at openjdk.org Fri Mar 28 05:13:20 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 28 Mar 2025 05:13:20 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects In-Reply-To: References: Message-ID: <6UoVmJrL7ZVcRvTUuCrPoqCJB0jcQJH7y-v2JWS7nlY=.2fccddcf-1843-4453-bf38-3e7f01bb39fc@github.com> On Tue, 25 Mar 2025 11:15:58 GMT, Emanuel Peter wrote: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Changes requested by galder (Author). test/hotspot/jtreg/compiler/lib/verify/Verify.java line 56: > 54: *

> 55: * By default, we only support comparison of the types mentioned above. However, in some cases one > 56: * might want to compare Objects of arbitrare classes by value, i.e. the recursive structure given Suggestion: * might want to compare Objects of arbitrary classes by value, i.e. the recursive structure given test/hotspot/jtreg/compiler/lib/verify/Verify.java line 79: > 77: * @throws VerifyException If the comparison fails. > 78: */ > 79: public static void checkEQ(Object a, Object b, boolean isFloatCheckWithRawBits, boolean isCheckWithArbitraryClasses) { Just a suggestion. One boolean might be ok, but once you start adding 2 booleans it seems like a bit of a code smell to me. Do you envision more options being added? I would personally create a `VerifyOptions` record with the boolean flags options and pass that in. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 264: > 262: private void checkEQimpl(float a, float b, String field, Object aParent, Object bParent) { > 263: if (isFloatEQ(a, b)) { > 264: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); Would using a text block here make it more readable? test/hotspot/jtreg/compiler/lib/verify/Verify.java line 277: > 275: private void checkEQimpl(double a, double b, String field, Object aParent, Object bParent) { > 276: if (isDoubleEQ(a, b)) { > 277: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); Same text block comment here test/hotspot/jtreg/compiler/lib/verify/Verify.java line 472: > 470: > 471: private void print(Object a, Object b, String field, Object aParent, Object bParent) { > 472: System.err.println(" aParent: " + aParent); Text block? test/hotspot/jtreg/testlibrary_tests/verify/tests/TestVerify.java line 691: > 689: } > 690: > 691: public static void checkNE(Object a, Object b) { If `checkEQ` lives in Verify, wouldn't it make sense to also have `checkNE` there? Seems like the natural place making the API symmetric. ------------- PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2724270449 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2017921263 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2017924041 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2017926106 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2017926245 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2017927145 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2017930961 From mbaesken at openjdk.org Fri Mar 28 08:22:34 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 28 Mar 2025 08:22:34 GMT Subject: RFR: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 09:27:58 GMT, Matthias Baesken wrote: >> When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test >> java/foreign/TestHandshake >> this error/warning is reported : >> >> jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' >> UndefinedBehaviorSanitizer:DEADLYSIGNAL >> UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. >> >> Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > calculate from_pct like we did before, clamp to_pct Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23962#issuecomment-2760520418 From mbaesken at openjdk.org Fri Mar 28 08:22:34 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 28 Mar 2025 08:22:34 GMT Subject: Integrated: 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 13:37:23 GMT, Matthias Baesken wrote: > When running jtreg tests on macOS aarch64 with ubsan - enabled binaries, in the test > java/foreign/TestHandshake > this error/warning is reported : > > jdk/src/hotspot/share/opto/block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' > UndefinedBehaviorSanitizer:DEADLYSIGNAL > UndefinedBehaviorSanitizer: nested bug in the same thread, aborting. > > Seems it happens in this calculation (float value does not fit into an int) : int to_pct = (int) ((100 * freq) / target->_freq); This pull request has now been integrated. Changeset: ddf326b8 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/ddf326b8e6e50403303b410635e4c26d7bf56aaa Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8346888: [ubsan] block.cpp:1617:30: runtime error: 9.97582e+36 is outside the range of representable values of type 'int' Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/23962 From epeter at openjdk.org Fri Mar 28 08:44:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 08:44:29 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v2] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects - clean up test - JDK-8352869 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/3457af00..55e7771c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=00-01 Stats: 42339 lines in 1437 files changed: 5810 ins; 34107 del; 2422 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From duke at openjdk.org Fri Mar 28 08:52:13 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Mar 2025 08:52:13 GMT Subject: RFR: 8350471: Unhandled compilation bailout in GraphKit::builtin_throw In-Reply-To: References: Message-ID: <0FBapfdvy_cYmOu9qKKhcGxCsmDdK7QTF5iXknCHXZE=.28215dca-7016-47ec-a41d-25a0acd5e819@github.com> On Wed, 26 Mar 2025 08:55:10 GMT, Manuel H?ssig wrote: > # Issue Summary > > When creating a builtin exception node, a stress test decided to bail out as if the allocation of the builtin exception objects had failed. Since these are preallocated at VM creation, the test failure is a false positive. > > # Change Rationale > > `GraphKit::builtin_throw()` features a bailout check after getting an appropriate exception object. However, up to that point, the execution in `builtin_throw()` cannot fail. In particular, there can be no failure to allocate the exception because these are all preallocated during `Threads::create_vm()` startup in `universe_post_init()` and `Threads:initialize_java_lang_classes()`. Further, none of the three callers handles a possible bailout in `builtin_throw()`. Hence, this PR removes the bailout check responsible for the test failure > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078715650) > - tier1 through tier3 and Oracle internal testing Thank you for the review, everyone! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24243#issuecomment-2760580816 From duke at openjdk.org Fri Mar 28 08:52:13 2025 From: duke at openjdk.org (duke) Date: Fri, 28 Mar 2025 08:52:13 GMT Subject: RFR: 8350471: Unhandled compilation bailout in GraphKit::builtin_throw In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 08:55:10 GMT, Manuel H?ssig wrote: > # Issue Summary > > When creating a builtin exception node, a stress test decided to bail out as if the allocation of the builtin exception objects had failed. Since these are preallocated at VM creation, the test failure is a false positive. > > # Change Rationale > > `GraphKit::builtin_throw()` features a bailout check after getting an appropriate exception object. However, up to that point, the execution in `builtin_throw()` cannot fail. In particular, there can be no failure to allocate the exception because these are all preallocated during `Threads::create_vm()` startup in `universe_post_init()` and `Threads:initialize_java_lang_classes()`. Further, none of the three callers handles a possible bailout in `builtin_throw()`. Hence, this PR removes the bailout check responsible for the test failure > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078715650) > - tier1 through tier3 and Oracle internal testing @mhaessig Your change (at version 8ca6eef274509f903f9a16e7b12d888a3a1ea9b3) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24243#issuecomment-2760583067 From duke at openjdk.org Fri Mar 28 08:56:38 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Mar 2025 08:56:38 GMT Subject: Integrated: 8350471: Unhandled compilation bailout in GraphKit::builtin_throw In-Reply-To: References: Message-ID: <5m36p1FENte8N8qjpcpddPgdQ2bRhBeo6WUARGTT32Q=.dfafdb06-f314-4e93-b944-661c657dc0bf@github.com> On Wed, 26 Mar 2025 08:55:10 GMT, Manuel H?ssig wrote: > # Issue Summary > > When creating a builtin exception node, a stress test decided to bail out as if the allocation of the builtin exception objects had failed. Since these are preallocated at VM creation, the test failure is a false positive. > > # Change Rationale > > `GraphKit::builtin_throw()` features a bailout check after getting an appropriate exception object. However, up to that point, the execution in `builtin_throw()` cannot fail. In particular, there can be no failure to allocate the exception because these are all preallocated during `Threads::create_vm()` startup in `universe_post_init()` and `Threads:initialize_java_lang_classes()`. Further, none of the three callers handles a possible bailout in `builtin_throw()`. Hence, this PR removes the bailout check responsible for the test failure > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078715650) > - tier1 through tier3 and Oracle internal testing This pull request has now been integrated. Changeset: 8ef78323 Author: Manuel H?ssig Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/8ef78323b1177782a645155fda19544fae24c279 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8350471: Unhandled compilation bailout in GraphKit::builtin_throw Reviewed-by: thartmann, chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24243 From duke at openjdk.org Fri Mar 28 09:09:59 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Mar 2025 09:09:59 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: - idealKit::loop: always call add_parse_predicates It was contstrained on UseParsePredicate, but this is incorrect, since all parse predicates are added in that function. - Improve description of UseLoopPredicate argument ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24248/files - new: https://git.openjdk.org/jdk/pull/24248/files/72ebfc8e..1561a0ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24248&range=04-05 Stats: 9 lines in 2 files changed: 0 ins; 2 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24248/head:pull/24248 PR: https://git.openjdk.org/jdk/pull/24248 From duke at openjdk.org Fri Mar 28 09:10:00 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Mar 2025 09:10:00 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 14:04:15 GMT, Emanuel Peter wrote: > You could also clean up the `IdealKit::loop`, which checks `UseLoopPredicate`only to call add_parse_predicates, which adds all predicates... and so it constrains too many things now. Cleaned up in [1561a0e](https://github.com/openjdk/jdk/pull/24248/commits/1561a0eea3b2049e4e9e6468d0237f60e97cd2e8). I also reran testing and everything passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2760619412 From duke at openjdk.org Fri Mar 28 09:10:01 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Mar 2025 09:10:01 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v5] In-Reply-To: <01JvdutO8geXQM1nMA6lw-SeC-bNIiApSPykfxnDZls=.3973b39b-7a36-44fa-8c13-91c02268c986@github.com> References: <01JvdutO8geXQM1nMA6lw-SeC-bNIiApSPykfxnDZls=.3973b39b-7a36-44fa-8c13-91c02268c986@github.com> Message-ID: On Thu, 27 Mar 2025 14:28:00 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/c2_globals.hpp line 790: >> >>> 788: "Move checks with an uncommon trap out of loops based on " \ >>> 789: "profiling data. " \ >>> 790: "Requires UseLoopPredicate to be turned on (default).") \ >> >> Can you also update the comment for `UseLoopPredicate`? It seems outdated / wrong. >> >> Now is: >> `Generate a predicate to select fast/slow loop versions` >> >> @chhagedorn do you have a good suggestion for what to put now? > > Good catch! It could be similar but without mentioning profiling data: > > Move checks with an uncommon trap out of loops. Fixed in [ca10148](https://github.com/openjdk/jdk/pull/24248/commits/ca101483aac17b0ace223df0f8a62bfd0dfa2e1f) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24248#discussion_r2018190355 From epeter at openjdk.org Fri Mar 28 09:24:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 09:24:33 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v3] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/lib/verify/Verify.java Co-authored-by: Galder Zamarre?o ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/55e7771c..00115d45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Fri Mar 28 09:28:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 09:28:14 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v3] In-Reply-To: <6UoVmJrL7ZVcRvTUuCrPoqCJB0jcQJH7y-v2JWS7nlY=.2fccddcf-1843-4453-bf38-3e7f01bb39fc@github.com> References: <6UoVmJrL7ZVcRvTUuCrPoqCJB0jcQJH7y-v2JWS7nlY=.2fccddcf-1843-4453-bf38-3e7f01bb39fc@github.com> Message-ID: <2HOt5bmHQa0LR8kS1LVr3BWiWvQBJXlLaAouDXOmNhg=.8051c0bc-37b2-405b-9684-674da915d284@github.com> On Fri, 28 Mar 2025 05:04:42 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/lib/verify/Verify.java >> >> Co-authored-by: Galder Zamarre?o > > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 264: > >> 262: private void checkEQimpl(float a, float b, String field, Object aParent, Object bParent) { >> 263: if (isFloatEQ(a, b)) { >> 264: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); > > Would using a text block here make it more readable? @galderz What do you mean by a text block? String Templates would be nice, but we don't have them. Do you mean I should use `String.format`? But there is always so tricky to know what is going to be formatted to where... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2018220193 From roland at openjdk.org Fri Mar 28 09:34:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 09:34:51 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v3] In-Reply-To: References: Message-ID: <8eWbY81i3q3Jeql0b-alefXqk5_rgteLvxxXrnMQSkg=.ea3a536a-2d1d-4545-af1b-6ae73743dc0f@github.com> > The test crashes because of a division by zero. The `Div` node for > that one is initially part of a counted loop. The control input of the > node is cleared because the divisor is non zero. This is because the > divisor depends on the loop phi and the type of the loop phi is > narrowed down when the counted loop is created. pre/main/post loops > are created, unrolling happens, the main loop looses its backedge. The > `Div` node can then float above the zero trip guard for the main > loop. When the zero trip guard is not taken, there's no guarantee the > divisor is non zero so the `Div` node should be pinned below it. > > I propose we revert the change I made with 8334724 which removed > `PhaseIdealLoop::cast_incr_before_loop()`. The `CastII` that this > method inserted was there to handle exactly this problem. It was added > initially for a similar issue but with array loads. That problem with > loads is handled some other way now and that's why I thought it was > safe to proceed with the removal. > > The code in this patch is somewhat different from the one we had > before for a couple reasons: > > 1- assert predicate code evolved and so previous logic can't be > resurrected as it was. > > 2- the previous logic has a bug. > > Regarding 1-: during pre/main/post loop creation, we used to add the > `CastII` and then to add assertion predicates (so assertion predicates > depended on the `CastII`). Then when unrolling, when assertion > predicates are updated, we would skip over the `CastII`. What I > propose here is to add the `CastII` after assertion predicates are > added. As a result, they don't depend on the `CastII` and there's no > need for any extra logic when unrolling happens. This, however, > doesn't work when the assertion predicates are added by RCE. In that > case, I had to add logic to skip over the `CastII` (similar to what > existed before I removed it). > > Regarding 2-: previous implementation for > `PhaseIdealLoop::cast_incr_before_loop()` would add the `CastII` at > the first loop `Phi` it encounters that's a use of the loop increment: > it's usually the iv but not always. I tweaked the test case to show, > this bug can actually cause a crash and changed the logic for > `PhaseIdealLoop::cast_incr_before_loop()` accordingly. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - other test + review comment - Merge branch 'master' into JDK-8349139 - Merge branch 'master' into JDK-8349139 - Merge branch 'master' into JDK-8349139 - fix & test ------------- Changes: https://git.openjdk.org/jdk/pull/23617/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23617&range=02 Stats: 199 lines in 7 files changed: 168 ins; 25 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23617.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23617/head:pull/23617 PR: https://git.openjdk.org/jdk/pull/23617 From roland at openjdk.org Fri Mar 28 09:34:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 09:34:51 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v2] In-Reply-To: References: <52OYoC5__FdcN8OLwVgdNlb6Fz_IFo8UyKy3GUp5DiM=.708f1ee8-dbbb-4abf-8de0-d94b3b1e2ef6@github.com> Message-ID: <9GhSsGe4ozoLE9e3LpgshLfQoIJVT31mrD5pTw_6qww=.427e48e5-6d8d-40a8-b976-4c3a31f36441@github.com> On Tue, 25 Feb 2025 08:19:59 GMT, Emanuel Peter wrote: > Are you saying this is another test? I'd really be happy if we had more tests for this case, because the current version seems fragile, since it is an almost perfect copy of a previous test that became ineffective this makes me even more nervous ? I added that test case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23617#issuecomment-2760707077 From roland at openjdk.org Fri Mar 28 09:34:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 09:34:52 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v2] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 08:07:27 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'master' into JDK-8349139 >> - fix & test > > src/hotspot/share/opto/loopTransform.cpp line 1703: > >> 1701: } >> 1702: // CastII for the new post loop: >> 1703: cast_incr_before_loop(zer_opaq->in(1), zer_taken, post_head); > > I see it is added for the main and post loop. Why not for the pre loop? The type of the trip `phi` of the loop before pre/main/post is computed from the type of the init value (input 1 of the trip `phi`) and the limit input to the exit condition. The way the trip `phi` type is computed, it can't be narrower than the init value type. If the loop body has a `Div` node and its control dependency is cleared because the divisor is known not zero from the type of the trip `phi`, then the divisor computed from the init value is also not zero and if the backedge of the loop is removed, the `Div` node can float. i.e. for (int i = start; i < stop; i += stride) { v = x / i; } if `i` not zero, it's because its type is computed from `start` and `start` is not zero so, if, for some reason, the loop runs for a single iteration: v = x/start; can float as high as the control of `start`. Same reasoning applies to the pre loop because the trip `phi` of that loop "includes" the whole type of the init value. But that doesn't work for the main (or post loop) because its trip `phi` gets its type from the loop before transformation and it captures the type of the init value for what is now the pre loop which is not the same as the type of the init value for the main (or post loop). In the example above, let's say `start` has type `[min, -1]` , `stop` is 0 and `stride` is 1. `i`'s type is `[min, -1]`. That doesn't include 0 so the `Div` is free to float. Now once pre/main/post loops are created, the type of the trip `phi` for the main loop is `[min, -1]`. Now let's say the main loop looses its back edge. The `Div` in the main loop can float and say the value for `i` out of the pre loop is 0. Now the floating `Div` from the main loop can trigger a fault. > src/hotspot/share/opto/loopnode.cpp line 6091: > >> 6089: if (uncast && init->is_CastII()) { >> 6090: // skip over the cast added by PhaseIdealLoop::cast_incr_before_loop() when pre/post/main loops are created because >> 6091: // it can get in the way of type propagation > > I think it would be nice if you said more about how it can get in the way of type propagation. Why would we sometimes have `uncast` on and sometimes off? You may even have a quick comment about it at the use-site. I improved the comment in the new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23617#discussion_r2018234487 PR Review Comment: https://git.openjdk.org/jdk/pull/23617#discussion_r2018236157 From roland at openjdk.org Fri Mar 28 09:37:13 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 09:37:13 GMT Subject: RFR: 8349139: C2: Div looses dependency on condition that guarantees divisor not null in counted loop [v2] In-Reply-To: References: <52OYoC5__FdcN8OLwVgdNlb6Fz_IFo8UyKy3GUp5DiM=.708f1ee8-dbbb-4abf-8de0-d94b3b1e2ef6@github.com> Message-ID: On Tue, 25 Feb 2025 08:19:59 GMT, Emanuel Peter wrote: >> Hmmm, may be you are right. I think adding a comment at `PhiNode` saying that people must not rely on it being pinned at the `Region` for dependencies would be a wise move, I can't think of any reason for that besides value narrowing right now but being pinned is a property of `Phi` regardless and we should tell people not to rely on this behaviour. >> >> For this bug, I think a more general fix is to try to compare the type of the `Phi` with that of the input it is going to be replaced with. If the former is not wider than the latter then we add a `CastNode`, since the cast is only about value range, not strict dependency, we can use `CarryDependency` instead of `UnconditionalDependency`. Am I right? > > Ah, I only just now read the comments from @merykitty and you. Oops. Hmm. > > Yes it seems that the `CountedLoop` trip `phi` is special. That's maybe not great to have such implicit assumptions laying around. But not sure what would have been the better alternative. > > @rwestrel >> It reproduces this issue and is actually a better test case because it doesn't even need StressGCM: > > Are you saying this is another test? I'd really be happy if we had more tests for this case, because the current version seems fragile, since it is an almost perfect copy of a previous test that became ineffective this makes me even more nervous ? @eme64 the PR is ready for reviews in case you have time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23617#issuecomment-2760716302 From mchevalier at openjdk.org Fri Mar 28 09:41:11 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 28 Mar 2025 09:41:11 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces In-Reply-To: <-Ri4lJUzCkI9yLG-kGwTGeAhd453SDgt_qvoB1iw4_A=.f3e126ab-a4ff-4f7f-80a7-c6e739cc6727@github.com> References: <-Ri4lJUzCkI9yLG-kGwTGeAhd453SDgt_qvoB1iw4_A=.f3e126ab-a4ff-4f7f-80a7-c6e739cc6727@github.com> Message-ID: On Thu, 27 Mar 2025 15:33:31 GMT, Tobias Hartmann wrote: >> If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. >> >> In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. >> >> This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. >> >> The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. >> >> Tested with tier1..3, hs-precheckin-comp and hs-comp-stress >> >> Thanks, >> Marc > > src/hotspot/share/opto/type.cpp line 3684: > >> 3682: } >> 3683: >> 3684: bool TypeInterfaces::has_non_array_interface() const { > > What about using `TypeAryPtr::_array_interfaces->contains(_interfaces);` instead? Almost! return !TypeAryPtr::_array_interfaces->contains(this); Contains is about TypeInterfaces, that is set of interfaces. So I just need to check that `this` is not a sub-set of array interfaces. That should do it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24245#discussion_r2018248760 From epeter at openjdk.org Fri Mar 28 09:45:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 09:45:24 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v3] In-Reply-To: <6UoVmJrL7ZVcRvTUuCrPoqCJB0jcQJH7y-v2JWS7nlY=.2fccddcf-1843-4453-bf38-3e7f01bb39fc@github.com> References: <6UoVmJrL7ZVcRvTUuCrPoqCJB0jcQJH7y-v2JWS7nlY=.2fccddcf-1843-4453-bf38-3e7f01bb39fc@github.com> Message-ID: On Fri, 28 Mar 2025 05:01:29 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/lib/verify/Verify.java >> >> Co-authored-by: Galder Zamarre?o > > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 79: > >> 77: * @throws VerifyException If the comparison fails. >> 78: */ >> 79: public static void checkEQ(Object a, Object b, boolean isFloatCheckWithRawBits, boolean isCheckWithArbitraryClasses) { > > Just a suggestion. One boolean might be ok, but once you start adding 2 booleans it seems like a bit of a code smell to me. Do you envision more options being added? I would personally create a `VerifyOptions` record with the boolean flags options and pass that in. I'll experiment with a `VerifyOptions`, thanks for the suggestion! > test/hotspot/jtreg/testlibrary_tests/verify/tests/TestVerify.java line 691: > >> 689: } >> 690: >> 691: public static void checkNE(Object a, Object b) { > > If `checkEQ` lives in Verify, wouldn't it make sense to also have `checkNE` there? Seems like the natural place making the API symmetric. To be honest, I don't see the need for it being symmetric, i.e. for the API to have a `checkNE`. This `checkNE` method is just a convenience function to check that an exception is thrown if I expect it to. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2018254687 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2018253409 From jbhateja at openjdk.org Fri Mar 28 09:50:10 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 28 Mar 2025 09:50:10 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v4] In-Reply-To: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: > Adding following IR transforms for unsigned vector Min / Max nodes. > > => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) > => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) > => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) > => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) > => UMaxV (a, a) => a > => UMinV (a, a) => a > > New IR validation test accompanies the patch. > > This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 - Review suggestions incorporated. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 - Updating copyright year of modified files - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 - Update IR transforms and tests - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 - 8342676: Unsigned Vector Min / Max transforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21604/files - new: https://git.openjdk.org/jdk/pull/21604/files/e9e09a5b..828c6c7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21604&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21604&range=02-03 Stats: 148954 lines in 3481 files changed: 58742 ins; 69392 del; 20820 mod Patch: https://git.openjdk.org/jdk/pull/21604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21604/head:pull/21604 PR: https://git.openjdk.org/jdk/pull/21604 From jbhateja at openjdk.org Fri Mar 28 09:50:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 28 Mar 2025 09:50:11 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v2] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: <-XYHug3idc7lCoK5oApnFdYgdopWOsw10WEpRLJpOSU=.ee991c25-ef76-4ca9-8e09-b28f9b9aa1be@github.com> On Tue, 25 Feb 2025 17:49:33 GMT, Emanuel Peter wrote: > @jatin-bhateja Just ping me here if this is ready for another review ;) @eme64 , please have a look once you have some review cycles. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21604#issuecomment-2760741608 From mchevalier at openjdk.org Fri Mar 28 09:53:13 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 28 Mar 2025 09:53:13 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:16:17 GMT, Marc Chevalier wrote: > If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. > > In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. > > This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. > > The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. > > Tested with tier1..3, hs-precheckin-comp and hs-comp-stress > > Thanks, > Marc I'm not sure how to write such an IR test. I'm looking at [TestArrayGuardWithInterfaces.java](https://github.com/openjdk/jdk/blob/3e9a7a4aed168422473c941ff5626d0d65aaadfa/test/hotspot/jtreg/compiler/intrinsics/TestArrayGuardWithInterfaces.java). I see the graphs of `test1` before and after, and the new one is smaller. But the nodes used are pretty much the same, or they don't feel clearly linked to interface checking: there is `DecodeNKlass` or `AddP`, but it doesn't seem obvious without having the graph under the eyes that it actually checks something meaningful. There are also less `If` (2 instead of 3), but once again, the test seems brittle. I also see that There is no more `Return` only `Halt` since we can now prove the function cannot return normally. But on the graph of `test2` ends with two `Halt`: traps everywhere, even if there are paths on which `test2` doesn't throw. So the lack of `Return` doesn't sound very robust. Overall, not sure what a good test would be. I can write a test that would not pass before and pass now, but I'm not convinced they would reliably catch regression, and that they won't break for unrelated reasons. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24245#issuecomment-2760760131 From epeter at openjdk.org Fri Mar 28 09:56:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 09:56:14 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v8] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 21:18:32 GMT, Jatin Bhateja wrote: >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 >> >> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). >> >> Summary of changes included with the patch: >> 1. C2 compiler New Vector IR creation. >> 2. Auto-vectorization support. >> 3. x86 backend implementation. >> 4. New IR verification test for each newly supported vector operation. >> >> Following are the performance numbers of Float16OperationsBenchmark >> >> System : Intel(R) Xeon(R) Processor code-named Granite rapids >> Frequency fixed at 2.5 GHz >> >> >> Baseline >> Benchmark (vectorDim) Mode Cnt Score Error Units >> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms >> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms >> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms >> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms >> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms >> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms >> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms >> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms >> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms >> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms >> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms >> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms >> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms >> Float16OperationsBenchmark.isInfiniteCMovBen... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Some re-factoring test/hotspot/jtreg/compiler/lib/generators/Generators.java line 375: > 373: * @return Random float16 generator. > 374: */ > 375: public Generator float16s() { Why do you not generate a `Float16` here? This here would probably conflict with a future `Short` generator which we might add in the future.... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2018270967 From rraj at openjdk.org Fri Mar 28 10:08:17 2025 From: rraj at openjdk.org (Rohit Arul Raj) Date: Fri, 28 Mar 2025 10:08:17 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v2] In-Reply-To: References: <_QvyAWuOP7uiUcWy0jEb6tN0CNIQOAQqZh8-7BxIWy4=.5a406f3e-ad92-492d-84c9-a3ef7e7941b2@github.com> Message-ID: On Wed, 26 Mar 2025 18:34:38 GMT, Srinivas Vamsi Parasa wrote: >> src/hotspot/cpu/x86/vm_version_x86.hpp line 778: >> >>> 776: static bool supports_avx512_simd_sort() { >>> 777: // Disable AVX512 version of SIMD Sort on AMD Zen4 Processors >>> 778: return ((is_intel() || (is_amd() && (cpu_family() > CPU_FAMILY_AMD_19H))) && supports_avx512dq()); } >> >> It's quite hard to parse. The following looks clearer to me: >> >> if (supports_avx512dq()) { >> // Disable AVX512 version of SIMD Sort on AMD Zen4 Processors. >> if (is_amd() && cpu_family() == CPU_FAMILY_AMD_19H) { >> return false; >> } >> return true; >> } >> return false; > > I second the suggested refactoring. Need to make sure the original `is_intel()` check is also included appropriately in the logic :) > It's quite hard to parse. The following looks clearer to me: > > ``` > if (supports_avx512dq()) { > // Disable AVX512 version of SIMD Sort on AMD Zen4 Processors. > if (is_amd() && cpu_family() == CPU_FAMILY_AMD_19H) { > return false; > } > return true; > } > return false; > ``` Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24053#discussion_r2018295775 From chagedorn at openjdk.org Fri Mar 28 10:25:44 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 28 Mar 2025 10:25:44 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops [v3] In-Reply-To: References: Message-ID: <2pn7Amnyitbm2OHxM5COLNJAc3E27C1ShNWxi44Ul-Q=.a50d9a76-2277-48bc-a12f-3d97742aaa0b@github.com> > _Note: The actual fix is only ~80 changed lines - everything else is about tests._ > > After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. > > For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). > > ### Maintain Assertion Predicates when Splitting a Loop > When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). > > #### Current State > Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: > > | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | > | ------------------------ | --------------------------------------- | --------------------------------------- | > | Create Main Loop | ? | ? | > | Create Post Loop | ? | ? | > | Loop Unswitching | ? | _not required, same init, stride and, limit_ | > | Loop Unrolling | ? | ? | > | Range Check Elimination | ? | ? | > | Loop Peeling | ? | ? | > | Splitting Main Loop | ? | ? | > > Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. > > #### Fixing Unsupported Cases > This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: > - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. > - Remove the restriction that we only clone Template Assertion Predicate ... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Remove UseLoopPredicate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24246/files - new: https://git.openjdk.org/jdk/pull/24246/files/fb25c10c..7502a6d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24246&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24246&range=01-02 Stats: 21 lines in 2 files changed: 11 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24246/head:pull/24246 PR: https://git.openjdk.org/jdk/pull/24246 From chagedorn at openjdk.org Fri Mar 28 10:25:44 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 28 Mar 2025 10:25:44 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 08:09:55 GMT, Christian Hagedorn wrote: >> _Note: The actual fix is only ~80 changed lines - everything else is about tests._ >> >> After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. >> >> For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). >> >> ### Maintain Assertion Predicates when Splitting a Loop >> When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). >> >> #### Current State >> Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: >> >> | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | >> | ------------------------ | --------------------------------------- | --------------------------------------- | >> | Create Main Loop | ? | ? | >> | Create Post Loop | ? | ? | >> | Loop Unswitching | ? | _not required, same init, stride and, limit_ | >> | Loop Unrolling | ? | ? | >> | Range Check Elimination | ? | ? | >> | Loop Peeling | ? | ? | >> | Splitting Main Loop | ? | ? | >> >> Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. >> >> #### Fixing Unsupported Cases >> This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: >> - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. >> - Remove the rest... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fixing test failure 8353019 During testing, I've noticed that one of my new tests failed with `-XX:-UseLoopPredicate -XX:-UseCompressedOops`. The reason was that we still have some `UseLoopPredicate` guards in place for Assertion Predicate handling. But this is not correct: We could also insert new Template Assertion Predicates with Range Check Elimination. When then turning such a main loop in a normal loop again with peeling (which happens in one of the test cases), we also need to update these Assertion Predicates - but we don't do that due to the `UseLoopPredicate` guards. We should remove these. I've added another `-XX:-UseLoopPredicate` run that uses `-Xcomp` but did not add ` -XX:-UseCompressedOops`. This allows us to get more diverse coverage in the CI with different flag combos on top. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24246#issuecomment-2760843363 From duke at openjdk.org Fri Mar 28 10:28:20 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 28 Mar 2025 10:28:20 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 Message-ID: # Issue Summary The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. # Changes This PR makes the following straight forward changes: - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. - Add `Or(I|L)` nodes to the IR framework. - Add a regression IR test for the implemented optimization. # Testing - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) - Ran tier1 through tier3 and Oracle internal testing ------------- Commit messages: - Add regression test - ir-framework: Add OrI, and OrL nodes - OrI/OrLNode: fold x | -1 Changes: https://git.openjdk.org/jdk/pull/24289/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24289&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352893 Stats: 108 lines in 3 files changed: 108 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24289/head:pull/24289 PR: https://git.openjdk.org/jdk/pull/24289 From epeter at openjdk.org Fri Mar 28 10:50:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 10:50:21 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops [v3] In-Reply-To: <2pn7Amnyitbm2OHxM5COLNJAc3E27C1ShNWxi44Ul-Q=.a50d9a76-2277-48bc-a12f-3d97742aaa0b@github.com> References: <2pn7Amnyitbm2OHxM5COLNJAc3E27C1ShNWxi44Ul-Q=.a50d9a76-2277-48bc-a12f-3d97742aaa0b@github.com> Message-ID: On Fri, 28 Mar 2025 10:25:44 GMT, Christian Hagedorn wrote: >> _Note: The actual fix is only ~80 changed lines - everything else is about tests._ >> >> After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. >> >> For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). >> >> ### Maintain Assertion Predicates when Splitting a Loop >> When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). >> >> #### Current State >> Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: >> >> | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | >> | ------------------------ | --------------------------------------- | --------------------------------------- | >> | Create Main Loop | ? | ? | >> | Create Post Loop | ? | ? | >> | Loop Unswitching | ? | _not required, same init, stride and, limit_ | >> | Loop Unrolling | ? | ? | >> | Range Check Elimination | ? | ? | >> | Loop Peeling | ? | ? | >> | Splitting Main Loop | ? | ? | >> >> Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. >> >> #### Fixing Unsupported Cases >> This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: >> - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. >> - Remove the rest... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Remove UseLoopPredicate Nice catch @chhagedorn :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24246#pullrequestreview-2725090741 From roland at openjdk.org Fri Mar 28 10:52:45 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 10:52:45 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: <-L8Nx1MUEYQQe3WABsGddXYoM6G8Ov1Hl8FvlgOa0zI=.a6901ad0-e364-471b-9d88-03ce7c7a4f22@github.com> References: <-L8Nx1MUEYQQe3WABsGddXYoM6G8Ov1Hl8FvlgOa0zI=.a6901ad0-e364-471b-9d88-03ce7c7a4f22@github.com> Message-ID: On Thu, 20 Mar 2025 09:01:06 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/node.cpp line 3110: > >> 3108: >> 3109: Node* TypeNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> 3110: if (can_reshape && Value(phase) == Type::TOP) { > > Why not use `phase->type(this)`? `Value` is called after `Ideal` so `phase->type(this)` is the previous type for this and if it was `top`, then we wouldn't be running this `Ideal`. What we want is the next type for the node to take action before it constant folds. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2018399929 From epeter at openjdk.org Fri Mar 28 10:56:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 10:56:30 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v4] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Verify.Options refactor for Galder ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/00115d45..f2ed085a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=02-03 Stats: 140 lines in 2 files changed: 36 ins; 3 del; 101 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Fri Mar 28 10:56:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 10:56:30 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v4] In-Reply-To: <6UoVmJrL7ZVcRvTUuCrPoqCJB0jcQJH7y-v2JWS7nlY=.2fccddcf-1843-4453-bf38-3e7f01bb39fc@github.com> References: <6UoVmJrL7ZVcRvTUuCrPoqCJB0jcQJH7y-v2JWS7nlY=.2fccddcf-1843-4453-bf38-3e7f01bb39fc@github.com> Message-ID: On Fri, 28 Mar 2025 05:10:41 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Verify.Options refactor for Galder > > Changes requested by galder (Author). @galderz Thanks for the comments! I think I addressed / answered all of them. Could you please have another look? :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2760979793 From chagedorn at openjdk.org Fri Mar 28 10:57:22 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 28 Mar 2025 10:57:22 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops [v3] In-Reply-To: <2pn7Amnyitbm2OHxM5COLNJAc3E27C1ShNWxi44Ul-Q=.a50d9a76-2277-48bc-a12f-3d97742aaa0b@github.com> References: <2pn7Amnyitbm2OHxM5COLNJAc3E27C1ShNWxi44Ul-Q=.a50d9a76-2277-48bc-a12f-3d97742aaa0b@github.com> Message-ID: <-bq0JnZ1aTX9STlbHugq8DfEo05zVnV5-qmznhyvteE=.5bc794d3-2509-4a02-b41b-6cceaf6f829a@github.com> On Fri, 28 Mar 2025 10:25:44 GMT, Christian Hagedorn wrote: >> _Note: The actual fix is only ~80 changed lines - everything else is about tests._ >> >> After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. >> >> For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). >> >> ### Maintain Assertion Predicates when Splitting a Loop >> When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). >> >> #### Current State >> Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: >> >> | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | >> | ------------------------ | --------------------------------------- | --------------------------------------- | >> | Create Main Loop | ? | ? | >> | Create Post Loop | ? | ? | >> | Loop Unswitching | ? | _not required, same init, stride and, limit_ | >> | Loop Unrolling | ? | ? | >> | Range Check Elimination | ? | ? | >> | Loop Peeling | ? | ? | >> | Splitting Main Loop | ? | ? | >> >> Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. >> >> #### Fixing Unsupported Cases >> This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: >> - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. >> - Remove the rest... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Remove UseLoopPredicate Thanks Emanuel! I'll rerun some testing again over the weekend. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24246#issuecomment-2760985656 From epeter at openjdk.org Fri Mar 28 11:20:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 11:20:28 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v4] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: <8Swg7V3_i3NXqNiMUc5y0R6utQHFGci-fpMi1bicsdU=.a6abb303-14f0-4c8f-8a57-ec4f09ef4218@github.com> On Fri, 28 Mar 2025 09:50:10 GMT, Jatin Bhateja wrote: >> Adding following IR transforms for unsigned vector Min / Max nodes. >> >> => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) >> => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) >> => UMaxV (a, a) => a >> => UMinV (a, a) => a >> >> New IR validation test accompanies the patch. >> >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 > - Review suggestions incorporated. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 > - Updating copyright year of modified files > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 > - Update IR transforms and tests > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 > - 8342676: Unsigned Vector Min / Max transforms @jatin-bhateja Thanks for the updates! I have a few more comments :) src/hotspot/share/opto/vectornode.cpp line 1045: > 1043: } > 1044: > 1045: bool VectorNode::is_commutative() { Why did you make this change? It seems unrelated, maybe it slipped in from another change set? Plus, it is not an accurate name, if `if (in(1)->_idx > in(2)->_idx) {` would fail, you would say that it is not commutative ... which is wrong ;) src/hotspot/share/opto/vectornode.cpp line 2210: > 2208: umax = n->in(2); > 2209: } > 2210: else if (lopc == Op_UMaxV && ropc == Op_UMinV) { Suggestion: } else if (lopc == Op_UMaxV && ropc == Op_UMinV) { src/hotspot/share/opto/vectornode.cpp line 2213: > 2211: umin = n->in(2); > 2212: umax = n->in(1); > 2213: } Suggestion: } else { // Either both Min or Max. return nullptr; } That way, you don't need the check below: `umin != nullptr && umax != nullptr`. src/hotspot/share/opto/vectornode.cpp line 2230: > 2228: } > 2229: > 2230: return static_cast(n)->VectorNode::Ideal(phase, can_reshape); Hmm. You now have a function `UMinMaxV_Ideal` that promises to do something specific (i.e. work with a MinMax case). But now you actually call into a more general method `VectorNode::Ideal`. I don't think that is a good approach ;) Our usual approach is something like this: 3871 Node *StoreBNode::Ideal(PhaseGVN *phase, bool can_reshape){ 3872 Node *progress = StoreNode::Ideal_masked_input(phase, 0xFF); 3873 if( progress != nullptr ) return progress; 3874 3875 progress = StoreNode::Ideal_sign_extended_input(phase, 24); 3876 if( progress != nullptr ) return progress; 3877 3878 // Finally check the default case 3879 return StoreNode::Ideal(phase, can_reshape); 3880 } That way, you don't have to do the whole trick with `static_cast(n)->` either. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21604#pullrequestreview-2725137975 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r2018426699 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r2018434561 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r2018439335 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r2018453144 From epeter at openjdk.org Fri Mar 28 11:20:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 11:20:29 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v4] In-Reply-To: <8Swg7V3_i3NXqNiMUc5y0R6utQHFGci-fpMi1bicsdU=.a6abb303-14f0-4c8f-8a57-ec4f09ef4218@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> <8Swg7V3_i3NXqNiMUc5y0R6utQHFGci-fpMi1bicsdU=.a6abb303-14f0-4c8f-8a57-ec4f09ef4218@github.com> Message-ID: On Fri, 28 Mar 2025 11:10:30 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - Review suggestions incorporated. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - Updating copyright year of modified files >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - Update IR transforms and tests >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - 8342676: Unsigned Vector Min / Max transforms > > src/hotspot/share/opto/vectornode.cpp line 2230: > >> 2228: } >> 2229: >> 2230: return static_cast(n)->VectorNode::Ideal(phase, can_reshape); > > Hmm. You now have a function `UMinMaxV_Ideal` that promises to do something specific (i.e. work with a MinMax case). But now you actually call into a more general method `VectorNode::Ideal`. I don't think that is a good approach ;) > > Our usual approach is something like this: > > 3871 Node *StoreBNode::Ideal(PhaseGVN *phase, bool can_reshape){ > 3872 Node *progress = StoreNode::Ideal_masked_input(phase, 0xFF); > 3873 if( progress != nullptr ) return progress; > 3874 > 3875 progress = StoreNode::Ideal_sign_extended_input(phase, 24); > 3876 if( progress != nullptr ) return progress; > 3877 > 3878 // Finally check the default case > 3879 return StoreNode::Ideal(phase, can_reshape); > 3880 } > > > That way, you don't have to do the whole trick with `static_cast(n)->` either. Ah, and then `UMinMaxV_Ideal` also does not need the argument `can_reshape` any more! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r2018456647 From epeter at openjdk.org Fri Mar 28 11:20:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 11:20:29 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v4] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> <8Swg7V3_i3NXqNiMUc5y0R6utQHFGci-fpMi1bicsdU=.a6abb303-14f0-4c8f-8a57-ec4f09ef4218@github.com> Message-ID: On Fri, 28 Mar 2025 11:11:56 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2230: >> >>> 2228: } >>> 2229: >>> 2230: return static_cast(n)->VectorNode::Ideal(phase, can_reshape); >> >> Hmm. You now have a function `UMinMaxV_Ideal` that promises to do something specific (i.e. work with a MinMax case). But now you actually call into a more general method `VectorNode::Ideal`. I don't think that is a good approach ;) >> >> Our usual approach is something like this: >> >> 3871 Node *StoreBNode::Ideal(PhaseGVN *phase, bool can_reshape){ >> 3872 Node *progress = StoreNode::Ideal_masked_input(phase, 0xFF); >> 3873 if( progress != nullptr ) return progress; >> 3874 >> 3875 progress = StoreNode::Ideal_sign_extended_input(phase, 24); >> 3876 if( progress != nullptr ) return progress; >> 3877 >> 3878 // Finally check the default case >> 3879 return StoreNode::Ideal(phase, can_reshape); >> 3880 } >> >> >> That way, you don't have to do the whole trick with `static_cast(n)->` either. > > Ah, and then `UMinMaxV_Ideal` also does not need the argument `can_reshape` any more! Suggestion: return nullptr; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r2018466314 From duke at openjdk.org Fri Mar 28 11:24:43 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 28 Mar 2025 11:24:43 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: Message-ID: <6TTfJ_5ui_Ls2jcfcV6VTXajsVND0x_Gwm6YmSQp-rY=.e9a8f90e-5b68-4321-9335-a055414d228b@github.com> > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about 15%-20% unsafe setmemory time Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: RISC-V: Intrinsify Unsafe::setMemory add unsafe memory check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/17c357c9..2491d1a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=00-01 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From epeter at openjdk.org Fri Mar 28 11:28:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 11:28:24 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v8] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 21:18:32 GMT, Jatin Bhateja wrote: >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 >> >> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). >> >> Summary of changes included with the patch: >> 1. C2 compiler New Vector IR creation. >> 2. Auto-vectorization support. >> 3. x86 backend implementation. >> 4. New IR verification test for each newly supported vector operation. >> >> Following are the performance numbers of Float16OperationsBenchmark >> >> System : Intel(R) Xeon(R) Processor code-named Granite rapids >> Frequency fixed at 2.5 GHz >> >> >> Baseline >> Benchmark (vectorDim) Mode Cnt Score Error Units >> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms >> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms >> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms >> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms >> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms >> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms >> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms >> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms >> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms >> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms >> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms >> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms >> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms >> Float16OperationsBenchmark.isInfiniteCMovBen... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Some re-factoring test/hotspot/jtreg/compiler/lib/generators/Generators.java line 172: > 170: * Generates uniform float16s in the range of [lo, hi) (inclusive of lo, exclusive of hi). > 171: */ > 172: public RestrictableGenerator uniformFloat16s(short lo, short hi) { Hmm, passing `short` into this API seems a little strange to me. Because the range `[lo, hi)` doesn't really make sense... rather it is the bits by `lo` and `hi` interpreted as `Float16`... but isn't that a little cumbersome? Why not pass in `Float16` instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2018477435 From epeter at openjdk.org Fri Mar 28 11:31:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 28 Mar 2025 11:31:40 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v8] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 21:18:32 GMT, Jatin Bhateja wrote: >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 >> >> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). >> >> Summary of changes included with the patch: >> 1. C2 compiler New Vector IR creation. >> 2. Auto-vectorization support. >> 3. x86 backend implementation. >> 4. New IR verification test for each newly supported vector operation. >> >> Following are the performance numbers of Float16OperationsBenchmark >> >> System : Intel(R) Xeon(R) Processor code-named Granite rapids >> Frequency fixed at 2.5 GHz >> >> >> Baseline >> Benchmark (vectorDim) Mode Cnt Score Error Units >> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms >> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms >> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms >> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms >> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms >> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms >> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms >> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms >> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms >> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms >> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms >> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms >> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms >> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms >> Float16OperationsBenchmark.isInfiniteCMovBen... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Some re-factoring Changes requested by epeter (Reviewer). test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 29: > 27: * @bug 8346236 > 28: * @summary Auto-vectorization support for various Float16 operations > 29: * @requires vm.compiler2.enabled Suggestion: I don't think C2 is a requirement, the IR framework can still run otherwise, and just disables the IR rules. test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 54: > 52: > 53: public static void main(String args[]) { > 54: TestFramework.runWithFlags("-XX:-TieredCompilation", "-Xbatch","--add-modules=jdk.incubator.vector"); Suggestion: TestFramework.runWithFlags("--add-modules=jdk.incubator.vector"); Were the other flags really required? ------------- PR Review: https://git.openjdk.org/jdk/pull/22755#pullrequestreview-2725227397 PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2018495495 PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2018496386 From mdoerr at openjdk.org Fri Mar 28 11:32:20 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Mar 2025 11:32:20 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v2] In-Reply-To: <6TTfJ_5ui_Ls2jcfcV6VTXajsVND0x_Gwm6YmSQp-rY=.e9a8f90e-5b68-4321-9335-a055414d228b@github.com> References: <6TTfJ_5ui_Ls2jcfcV6VTXajsVND0x_Gwm6YmSQp-rY=.e9a8f90e-5b68-4321-9335-a055414d228b@github.com> Message-ID: <2fgN3CsNZhntKwKFB0wASOYIMziKQLtpWlQwjYqgzCM=.c47036c7-9356-4486-9922-c54599ed33e9@github.com> On Fri, 28 Mar 2025 11:24:43 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about 15%-20% unsafe setmemory time > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > RISC-V: Intrinsify Unsafe::setMemory > > add unsafe memory check Unfortunately, 'UnsafeMemoryAccessMark' doesn't work like this. Your current implementation marks the code region in `unsafe_setmemory`, but the accesses are done in `StubRoutines::_jbyte_fill` which are outside of that region. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2761102248 From shade at openjdk.org Fri Mar 28 12:35:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 28 Mar 2025 12:35:54 GMT Subject: RFR: 8353176: C1: x86 patching stub always calls Thread::current() Message-ID: Noticed this while looking at compiled code density. In C1 PatchingStub code, we _always_ perform a runtime call to `Thread::current()`, even though we can rely on `r15_thread` to be available. This is because we are calling to `MacroAssembler::get_thread()`, which is always doing slowpath. This kind of accident would be less likely / impossible once we cleanup uses of `MacroAssembler::get_thread()` with [JDK-8353174](https://bugs.openjdk.org/browse/JDK-8353174). Additional testing: - [x] Linux x86_64 server fastdebug, `tier1` - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` - [ ] Linux x86_64 server fastdebug, `all` - [ ] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` ------------- Commit messages: - More comprehensive fix - Fix Changes: https://git.openjdk.org/jdk/pull/24291/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24291&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353176 Stats: 10 lines in 1 file changed: 0 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24291.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24291/head:pull/24291 PR: https://git.openjdk.org/jdk/pull/24291 From shade at openjdk.org Fri Mar 28 12:35:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 28 Mar 2025 12:35:54 GMT Subject: RFR: 8353176: C1: x86 patching stub always calls Thread::current() In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 11:05:33 GMT, Aleksey Shipilev wrote: > Noticed this while looking at compiled code density. In C1 PatchingStub code, we _always_ perform a runtime call to `Thread::current()`, even though we can rely on `r15_thread` to be available. This is because we are calling to `MacroAssembler::get_thread()`, which is always doing slowpath. > > This kind of accident would be less likely / impossible once we cleanup uses of `MacroAssembler::get_thread()` with [JDK-8353174](https://bugs.openjdk.org/browse/JDK-8353174). > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` Sample code density improvements: $ for I in 1 2 3; do build/linux-x86_64-server-release/images/jdk/bin/java \ -XX:TieredStopAtLevel=${I} -Xcomp -XX:+CITime \ Hello 2>&1 | grep "nmethod code size"; done # baseline tier1: 463424 bytes tier2: 499456 bytes tier3: 1051016 bytes # patched tier1: 436632 bytes ; -6.1% tier2: 472808 bytes ; -5.6% tier3: 1024376 bytes ; -2.6% ------------- PR Comment: https://git.openjdk.org/jdk/pull/24291#issuecomment-2761044476 From roland at openjdk.org Fri Mar 28 12:41:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 12:41:11 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 10:00:37 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/node.cpp line 3100: > >> 3098: loop->register_new_node(frame, igvn->C->start()); >> 3099: } >> 3100: Node* halt = new HaltNode(c, frame, "dead path discovered by TypeNode"); > > The more info we can attach to the `HaltNode`, the better. It would make debugging easier if it is ever hit. Is there anything in particular you think makes sense adding? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2018583157 From roland at openjdk.org Fri Mar 28 12:49:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 12:49:09 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 12:37:34 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/node.cpp line 3100: >> >>> 3098: loop->register_new_node(frame, igvn->C->start()); >>> 3099: } >>> 3100: Node* halt = new HaltNode(c, frame, "dead path discovered by TypeNode"); >> >> The more info we can attach to the `HaltNode`, the better. It would make debugging easier if it is ever hit. > > Is there anything in particular you think makes sense adding? Maybe what phase (igvn or ccp) found the dead path might be useful for diagnosis. Not sure what else would be. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2018593940 From duke at openjdk.org Fri Mar 28 12:50:18 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 28 Mar 2025 12:50:18 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v2] In-Reply-To: <2fgN3CsNZhntKwKFB0wASOYIMziKQLtpWlQwjYqgzCM=.c47036c7-9356-4486-9922-c54599ed33e9@github.com> References: <6TTfJ_5ui_Ls2jcfcV6VTXajsVND0x_Gwm6YmSQp-rY=.e9a8f90e-5b68-4321-9335-a055414d228b@github.com> <2fgN3CsNZhntKwKFB0wASOYIMziKQLtpWlQwjYqgzCM=.c47036c7-9356-4486-9922-c54599ed33e9@github.com> Message-ID: On Fri, 28 Mar 2025 11:29:56 GMT, Martin Doerr wrote: > Unfortunately, 'UnsafeMemoryAccessMark' doesn't work like this. Your current implementation marks the code region in `unsafe_setmemory`, but the accesses are done in `StubRoutines::_jbyte_fill` which are outside of that region. Thanks for the reminder. I'm not sure if I understand this correctly, you mean 'generate_fill(_jbyte_fill)' has reach a new block with get new 'pc' and new 'enter', but the 'UnsafeMemoryAccessMark' function in 'generate_unsafe_setmemory' only works inside the old block, so that it does not work correctly? I will spend some more time sorting out this logic and try to fix it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2761273256 From duke at openjdk.org Fri Mar 28 13:03:48 2025 From: duke at openjdk.org (Zihao Lin) Date: Fri, 28 Mar 2025 13:03:48 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v4] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8344116: C2: remove slice parameter from LoadNode::make ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/08c1a382..f6b2fbec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=02-03 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From mdoerr at openjdk.org Fri Mar 28 13:10:21 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Mar 2025 13:10:21 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v2] In-Reply-To: <6TTfJ_5ui_Ls2jcfcV6VTXajsVND0x_Gwm6YmSQp-rY=.e9a8f90e-5b68-4321-9335-a055414d228b@github.com> References: <6TTfJ_5ui_Ls2jcfcV6VTXajsVND0x_Gwm6YmSQp-rY=.e9a8f90e-5b68-4321-9335-a055414d228b@github.com> Message-ID: <50Kgvlj9a3UikgLCH4w-QSUsYv-_suxpE6BW67d7cjw=.367e5c5d-bb74-42a7-a6f2-ce4129306a3b@github.com> On Fri, 28 Mar 2025 11:24:43 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about 15%-20% unsafe setmemory time > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > RISC-V: Intrinsify Unsafe::setMemory > > add unsafe memory check It should work if you move `UnsafeMemoryAccessMark` into `generate_fill(_jbyte_fill)`. Not sure if that is desired. Also note that enter + leave are pointless in your code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2761317827 From rcastanedalo at openjdk.org Fri Mar 28 13:27:40 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 28 Mar 2025 13:27:40 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> Message-ID: On Mon, 24 Mar 2025 15:33:34 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Extend example with offset register mask As we discussed offline, the test coverage of register mask operations with extended dynamic parts, non-zero offsets, etc. is fairly low (basically limited to the new JTReg tests included in this changeset). To increase coverage, I have extended `test_regmask.cpp` with tests that perform random operations on a register mask and on a reference bit set and check that the result is equivalent on both data structures. Here is the extension: https://github.com/openjdk/jdk/commit/4ee703f1ab73f8f43d4603d7fa88dcc8f4950ec0. I ran the random tests a few times on different platforms and could not find any failure, which gives a good confidence of the correctness of the register mask operation changes. I also tested the effectiveness of the tests themselves by injecting a few failures in the register mask implementation and confirming their detection. Feel free to include the test extensions in this changeset (you might want to go through the code and clean it up a bit before, though, things lik e e.g. naming consistency). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2761357538 From dlunden at openjdk.org Fri Mar 28 15:03:28 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 28 Mar 2025 15:03:28 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> Message-ID: On Mon, 24 Mar 2025 15:33:34 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Extend example with offset register mask Thanks for your contributions Roberto; random testing of register mask operations is very useful. I will have a closer look at the patch next week and merge it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2761619668 From mdoerr at openjdk.org Fri Mar 28 15:28:08 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Mar 2025 15:28:08 GMT Subject: RFR: 8353176: C1: x86 patching stub always calls Thread::current() In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 11:05:33 GMT, Aleksey Shipilev wrote: > Noticed this while looking at compiled code density. In C1 PatchingStub code, we _always_ perform a runtime call to `Thread::current()`, even though we can rely on `r15_thread` to be available. This currently emits a huge sequence of instructions for register save/restore and the call itself. > > Current code calls to `MacroAssembler::get_thread()`, which is always doing that slowpath. This kind of accident would be less likely / impossible once we cleanup uses of `MacroAssembler::get_thread()` with [JDK-8353174](https://bugs.openjdk.org/browse/JDK-8353174). > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24291#pullrequestreview-2725918674 From roland at openjdk.org Fri Mar 28 15:32:30 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 15:32:30 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: > This is primarily motivated by 8275202 (C2: optimize out more > redundant conditions). In the following code snippet: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > int v = array[i]; > > > (`arraySize` is a constant) > > at the range check, `j` is known to be in `[min, arraySize]` as a > consequence, `i` is known to be `[0, arraySize-1]`. The range check > can be eliminated. > > Now, if later, `i` constant folds to some value that's positive but > out of range for the array: > > - if that happens when the new pass runs, then it can prove that: > > if (i < j) { > > is never taken. > > - if that happens during IGVN or CCP however, that condition is not > constant folded. And because the range check was removed, there's no > guard protecting the range check `CastII`. It becomes `top` and, as > a result, the graph can become broken. > > What I propose here is that when the `CastII` becomes dead, any CFG > paths that use the `CastII` node is made unreachable. So in pseudo code: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > halt(); > > > Finding the CFG paths is implemented in the patch by following the > uses of the node until a CFG node or a `Phi` is encountered. > > The patch applies this to all `Type` nodes as with 8275202, I also ran > in some rare corner cases with other types of nodes. The exception is > `Phi` nodes which may not be as easy to handle (and for which I had no > issue with 8275202). > > Finally, the patch includes a test case that's unrelated to the > discussion of 8275202 above. In that test case, a `CastII` becomes top > but the test that guards it doesn't constant fold. The root cause is a > transformation of: > > > (CastII (AddI > > > into > > > (AddI (CastII ) (CastII)` > > > which causes the resulting node to have a wider type. The `CastII` > captures a type before the transformation above happens. Once it has > happened, the guard for the `CastII` can't be constant folded when an > out of bound value occurs. > > This is likely fixable some other way (eventhough it doesn't seem > straightforward). Given the long history of similar issues (and the > test case that shows that they are more hiding), I think it would > make sense to try some other way of approaching them. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - review - Merge branch 'master' into JDK-8349479 - review - whitespace - fix & test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23468/files - new: https://git.openjdk.org/jdk/pull/23468/files/73dd6d84..f310865f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=01-02 Stats: 177870 lines in 4403 files changed: 75946 ins; 76167 del; 25757 mod Patch: https://git.openjdk.org/jdk/pull/23468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23468/head:pull/23468 PR: https://git.openjdk.org/jdk/pull/23468 From roland at openjdk.org Fri Mar 28 15:32:30 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 15:32:30 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 12:45:41 GMT, Roland Westrelin wrote: >> Is there anything in particular you think makes sense adding? > > Maybe what phase (igvn or ccp) found the dead path might be useful for diagnosis. Not sure what else would be. New commit includes the phase in the message. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2018889926 From roland at openjdk.org Fri Mar 28 15:32:30 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 15:32:30 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 15:28:44 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8349479 > - review > - whitespace > - fix & test New commit adds a command line to turn this off. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2761697105 From roland at openjdk.org Fri Mar 28 15:32:30 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 15:32:30 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v2] In-Reply-To: <-L8Nx1MUEYQQe3WABsGddXYoM6G8Ov1Hl8FvlgOa0zI=.a6901ad0-e364-471b-9d88-03ce7c7a4f22@github.com> References: <-L8Nx1MUEYQQe3WABsGddXYoM6G8Ov1Hl8FvlgOa0zI=.a6901ad0-e364-471b-9d88-03ce7c7a4f22@github.com> Message-ID: On Thu, 20 Mar 2025 08:27:04 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > test/hotspot/jtreg/compiler/c2/TestGuardOfCastIIDoesntFold.java line 30: > >> 28: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseOnStackReplacement >> 29: * -XX:CompileCommand=dontinline,TestGuardOfCastIIDoesntFold::notInlined >> 30: * TestGuardOfCastIIDoesntFold > > Nit: can we have a run without flags, please ;) Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2018890260 From kvn at openjdk.org Fri Mar 28 15:38:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 28 Mar 2025 15:38:08 GMT Subject: RFR: 8353176: C1: x86 patching stub always calls Thread::current() In-Reply-To: References: Message-ID: <0YhvCZbgqtIymdNtLbs-58zCmg63pltlvzoLV2GZ-Hg=.95bcb696-7fce-4c86-b22d-5c95441c6eb8@github.com> On Fri, 28 Mar 2025 11:05:33 GMT, Aleksey Shipilev wrote: > Noticed this while looking at compiled code density. In C1 PatchingStub code, we _always_ perform a runtime call to `Thread::current()`, even though we can rely on `r15_thread` to be available. This currently emits a huge sequence of instructions for register save/restore and the call itself. > > Current code calls to `MacroAssembler::get_thread()`, which is always doing that slowpath. This kind of accident would be less likely / impossible once we cleanup uses of `MacroAssembler::get_thread()` with [JDK-8353174](https://bugs.openjdk.org/browse/JDK-8353174). > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24291#pullrequestreview-2725944555 From kvn at openjdk.org Fri Mar 28 15:51:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 28 Mar 2025 15:51:44 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 In-Reply-To: References: Message-ID: <2Ftvot6pMHAix-eHQG6_JU46mEqd33PzkE6XbVm86_E=.2a7da698-a5f4-4a05-9cee-a12ec713c7ac@github.com> On Fri, 28 Mar 2025 10:21:57 GMT, Manuel H?ssig wrote: > # Issue Summary > > The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. > > # Changes > > This PR makes the following straight forward changes: > - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. > - Add `Or(I|L)` nodes to the IR framework. > - Add a regression IR test for the implemented optimization. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) > - Ran tier1 through tier3 and Oracle internal testing Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24289#pullrequestreview-2725977245 From roland at openjdk.org Fri Mar 28 16:04:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 16:04:41 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 12:02:31 GMT, Christian Hagedorn wrote: >>> Drive by comments: Is `-XX:-UseOnStackReplacement` required to reproduce the issue? >> >> No, it's not. I trimmed the list of options for the tests. >> >>> There was also a crash when running with `-XX:+TraceLoopOpts`. Can you also add a run with that flag to verify that this patch also fixes that? >> >> Added. The `TraceLoopOpts` crash reproduces: the code hits a malformed counted loop. I tweaked the printing code. > >> Added. The TraceLoopOpts crash reproduces: the code hits a malformed counted loop. I tweaked the printing code. > > Is the malformed counted loop expected or a different issue to look into? It doesn't look an actual issue to me. `PhiNode::Value` manages to narrow the trip `phi`'s type of the pre loop enough that it's a constant. So the loop no longer has the expected counted loop shape but the loop exit condition that should constant fold doesn't because it's guarded by an `Opaque1` node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2018941323 From cslucas at openjdk.org Fri Mar 28 16:18:16 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 28 Mar 2025 16:18:16 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all Message-ID: Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. Tested on OSX/Linux Aarch64/x86_64 with JTREG. ------------- Commit messages: - Set different values for comp_level any/all Changes: https://git.openjdk.org/jdk/pull/24298/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24298&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334046 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24298.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24298/head:pull/24298 PR: https://git.openjdk.org/jdk/pull/24298 From shade at openjdk.org Fri Mar 28 16:20:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 28 Mar 2025 16:20:27 GMT Subject: RFR: 8351156: C1: Remove FPU stack support after 32-bit x86 removal In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 16:35:01 GMT, Vladimir Kozlov wrote: >> C1 has the 32-bit x86 specific code that supports x87 FPU stack allocations. With 32-bit x86 port removed, we can clean up those parts. 64-bit x86 does not need x87 FPU, since it is baselined on SSE2 and using XMM registers instead. >> >> There are lots of deeper cleanups possible, this PR focuses on removing the x87 FPU allocation/uses in C1. Note current C1 nomenclature is confusing. On all arches, "FPU" means floating-point _registers_. That is _except_ on x86, where "FPU" means x87 FPU stack, and "XMM" means floating-point registers. This is why we only touch "FPU" code on other architectures only in a light manner. After all 32-bit x86 cleanups land, we might consider renaming "XMM" -> "FPU" in x86 C1 to match the common nomenclature. >> >> Brief tour of changes: >> - FPU stack simulator is not needed anymore, so I removed it and the related infrastructure >> - Lots of 32-bit specific paths that touch x87 FPU registers are pruned >> - Related LIR nodes like `lir_fxch`, `lir_fld`, `lir_fpop_raw` are pruned >> - Simplified the API that is no longer needed, e.g. dropping `pop_fpu_stack` >> >> This PR would likely conflict with some in-flight cleanups, so it would require merges later. Take a look meanwhile. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > >> On all arches, "FPU" means floating-point registers. > > Which is wrong I think. It should be "FPR". May be file RFE to rename `LIR_Opr::is_*_fpu()` methods to `LIR_Opr::is_*_fpr()` Are you still good with this, @vnkozlov? I would like to integrate it sooner to unblock some dependent cleanup work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24274#issuecomment-2761826218 From kvn at openjdk.org Fri Mar 28 16:33:24 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 28 Mar 2025 16:33:24 GMT Subject: RFR: 8351156: C1: Remove FPU stack support after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 19:41:48 GMT, Aleksey Shipilev wrote: >> C1 has the 32-bit x86 specific code that supports x87 FPU stack allocations. With 32-bit x86 port removed, we can clean up those parts. 64-bit x86 does not need x87 FPU, since it is baselined on SSE2 and using XMM registers instead. >> >> There are lots of deeper cleanups possible, this PR focuses on removing the x87 FPU allocation/uses in C1. Note current C1 nomenclature is confusing. On all arches, "FPU" means floating-point _registers_. That is _except_ on x86, where "FPU" means x87 FPU stack, and "XMM" means floating-point registers. This is why we only touch "FPU" code on other architectures only in a light manner. After all 32-bit x86 cleanups land, we might consider renaming "XMM" -> "FPU" in x86 C1 to match the common nomenclature. >> >> Brief tour of changes: >> - FPU stack simulator is not needed anymore, so I removed it and the related infrastructure >> - Lots of 32-bit specific paths that touch x87 FPU registers are pruned >> - Related LIR nodes like `lir_fxch`, `lir_fld`, `lir_fpop_raw` are pruned >> - Simplified the API that is no longer needed, e.g. dropping `pop_fpu_stack` >> >> This PR would likely conflict with some in-flight cleanups, so it would require merges later. Take a look meanwhile. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8351156-x86-c1-fpustack > - Fixing build failures after reg2stack > - Remove remaining FPU uses in LIRAssembler_x86 > - Touchups > - Initial fix Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24274#pullrequestreview-2726088872 From roland at openjdk.org Fri Mar 28 16:38:00 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 28 Mar 2025 16:38:00 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v15] In-Reply-To: References: Message-ID: <-rQb2ZR6hrzt-7Q0EwQqlxjvVuDQQOgYqzX3tZVPL38=.2577f4e0-c35f-434e-88d1-f0db41bb5364@github.com> > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: - merge fix - Merge branch 'master' into JDK-8342692 - merge fix - Merge branch 'master' into JDK-8342692 - merge - Merge branch 'master' into JDK-8342692 - Merge branch 'master' into JDK-8342692 - whitespace - Merge branch 'master' into JDK-8342692 - TestMemorySegment test fix - ... and 31 more: https://git.openjdk.org/jdk/compare/dc5c4148...065abb29 ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=14 Stats: 1310 lines in 25 files changed: 1250 ins; 13 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From shade at openjdk.org Fri Mar 28 17:03:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 28 Mar 2025 17:03:07 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 16:12:47 GMT, Cesar Soares Lucas wrote: > Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. > Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. > Tested on OSX/Linux Aarch64/x86_64 with JTREG. Looks sensible to me, but @veresov should really take a look. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24298#pullrequestreview-2726158278 From vlivanov at openjdk.org Fri Mar 28 17:49:31 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 28 Mar 2025 17:49:31 GMT Subject: RFR: 8353176: C1: x86 patching stub always calls Thread::current() In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 11:05:33 GMT, Aleksey Shipilev wrote: > Noticed this while looking at compiled code density. In C1 PatchingStub code, we _always_ perform a runtime call to `Thread::current()`, even though we can rely on `r15_thread` to be available. This currently emits a huge sequence of instructions for register save/restore and the call itself. > > Current code calls to `MacroAssembler::get_thread()`, which is always doing that slowpath. This kind of accident would be less likely / impossible once we cleanup uses of `MacroAssembler::get_thread()` with [JDK-8353174](https://bugs.openjdk.org/browse/JDK-8353174). > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` > - [ ] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24291#pullrequestreview-2726274704 From vlivanov at openjdk.org Fri Mar 28 18:05:21 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 28 Mar 2025 18:05:21 GMT Subject: RFR: 8351156: C1: Remove FPU stack support after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 20:07:32 GMT, Aleksey Shipilev wrote: > I think the intent for ShouldNotReachHere()-s in C1 code is to crash out on a gross error, instead of silently miscompiling. It's not an universal convention and it varies even in `c1_LIRAssembler_x86.cpp`. For example, `LIR_Assembler::reg2mem` uses asserts for small dynamic dispatch tables while using `ShouldNotReachHere` on default case in top switch. Anyway, it's a minor thing and more of a code style. If we want to crash on such conditions, `guarantee(cond, "")` looks superior to `if (cond) { ... } else { ShouldNotReachHere(); }`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24274#discussion_r2019108812 From shade at openjdk.org Fri Mar 28 18:05:21 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 28 Mar 2025 18:05:21 GMT Subject: RFR: 8351156: C1: Remove FPU stack support after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 17:59:21 GMT, Vladimir Ivanov wrote: >> I think the intent for `ShouldNotReachHere()`-s in C1 code is to crash out on a gross error, instead of silently miscompiling. I would not be 100% sure src type and dst reg type are always correct. It is basically `guarantee` in disguise, AFAIU. So I'd prefer to keep it as is. > >> I think the intent for ShouldNotReachHere()-s in C1 code is to crash out on a gross error, instead of silently miscompiling. > > It's not an universal convention and it varies even in `c1_LIRAssembler_x86.cpp`. > > For example, `LIR_Assembler::reg2mem` uses asserts for small dynamic dispatch tables while using `ShouldNotReachHere` on default case in top switch. > > Anyway, it's a minor thing and more of a code style. > > If we want to crash on such conditions, `guarantee(cond, "")` looks superior to `if (cond) { ... } else { ShouldNotReachHere(); }`. Yeah, I tend to agree. But I see the code style in majority of C1 code I saw/touched is `ShouldNotReachHere()`, so I prefer to stick with it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24274#discussion_r2019112261 From jbhateja at openjdk.org Fri Mar 28 18:39:40 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 28 Mar 2025 18:39:40 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v9] In-Reply-To: References: Message-ID: > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comment resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22755/files - new: https://git.openjdk.org/jdk/pull/22755/files/6f89f3f3..a25eb507 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From jbhateja at openjdk.org Fri Mar 28 18:39:40 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 28 Mar 2025 18:39:40 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v8] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 11:21:42 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Some re-factoring > > test/hotspot/jtreg/compiler/lib/generators/Generators.java line 172: > >> 170: * Generates uniform float16s in the range of [lo, hi) (inclusive of lo, exclusive of hi). >> 171: */ >> 172: public RestrictableGenerator uniformFloat16s(short lo, short hi) { > > Hmm, passing `short` into this API seems a little strange to me. Because the range `[lo, hi)` doesn't really make sense... rather it is the bits by `lo` and `hi` interpreted as `Float16`... but isn't that a little cumbersome? Why not pass in `Float16` instead? We want to avoid making a reference to an incubating class for now as every test using generator will then need to --add-modules=jdk.incubator.vector. Also, this will not pose any issue when we add a new Generator for short values since underlines framework APIs will be different from the once used for float16 generator with Short carrier type. > test/hotspot/jtreg/compiler/lib/generators/Generators.java line 375: > >> 373: * @return Random float16 generator. >> 374: */ >> 375: public Generator float16s() { > > Why do you not generate a `Float16` here? This here would probably conflict with a future `Short` generator which we might add in the future.... To avoid dependency on an incubating module. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2019151114 PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r2019154395 From psandoz at openjdk.org Fri Mar 28 18:41:33 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 28 Mar 2025 18:41:33 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v3] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 06:37:54 GMT, Rohit Arul Raj wrote: >> In JDK-8309130, Array sort was optimized using AVX512 SIMD instructions for x86_64. Currently, this optimization has been disabled for AMD Zen 4 [JDK-8317763] due to bad performance of compressstoreu. >> Ref: https://www.reddit.com/r/java/comments/171t5sj/heads_up_openjdk_implementation_of_avx512_based/. >> >> This patch enables Zen 4 to pick optimized AVX2 version of SIMD sort and Zen 5 picks the AVX512 version. >> >> JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. >> >> Attaching ArraySort performance data for Zen4 & Zen5. >> [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) >> [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) > > Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: > > Refactor 'supports_avx512_simd_sort' code to make it easily readable Marked as reviewed by psandoz (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24053#pullrequestreview-2726391977 From vlivanov at openjdk.org Fri Mar 28 18:46:45 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 28 Mar 2025 18:46:45 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 15:32:30 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8349479 > - review > - whitespace > - fix & test Looks good. src/hotspot/share/opto/node.cpp line 3134: > 3132: size_t len = ss.size() + 1; > 3133: char* arena_str = NEW_ARENA_ARRAY(igvn->C->comp_arena(), char, len); > 3134: memcpy(arena_str, ss.base(), len); Does it make sense to move it into `stringStream::as_string()`? `stringStream::as_string()` already handles resource area and C-heap allocations. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23468#pullrequestreview-2726399792 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2019162213 From vlivanov at openjdk.org Fri Mar 28 18:55:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 28 Mar 2025 18:55:13 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v3] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 06:37:54 GMT, Rohit Arul Raj wrote: >> In JDK-8309130, Array sort was optimized using AVX512 SIMD instructions for x86_64. Currently, this optimization has been disabled for AMD Zen 4 [JDK-8317763] due to bad performance of compressstoreu. >> Ref: https://www.reddit.com/r/java/comments/171t5sj/heads_up_openjdk_implementation_of_avx512_based/. >> >> This patch enables Zen 4 to pick optimized AVX2 version of SIMD sort and Zen 5 picks the AVX512 version. >> >> JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. >> >> Attaching ArraySort performance data for Zen4 & Zen5. >> [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) >> [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) > > Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: > > Refactor 'supports_avx512_simd_sort' code to make it easily readable Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24053#pullrequestreview-2726422407 From thartmann at openjdk.org Fri Mar 28 19:27:15 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 28 Mar 2025 19:27:15 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:16:17 GMT, Marc Chevalier wrote: > If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. > > In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. > > This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. > > The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. > > Tested with tier1..3, hs-precheckin-comp and hs-comp-stress > > Thanks, > Marc Right, I was hoping that there would be some other suitable users of `GraphKit::get_layout_helper` that would now be folded but all current uses either trap or don't handle both arrays and non-arrays (and therefore wouldn't fold). So I agree, adding an IR framework test does not make sense. The existing test is sufficient. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24245#issuecomment-2762234661 From thartmann at openjdk.org Fri Mar 28 19:27:15 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 28 Mar 2025 19:27:15 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces In-Reply-To: References: <-Ri4lJUzCkI9yLG-kGwTGeAhd453SDgt_qvoB1iw4_A=.f3e126ab-a4ff-4f7f-80a7-c6e739cc6727@github.com> Message-ID: On Fri, 28 Mar 2025 09:38:19 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/type.cpp line 3684: >> >>> 3682: } >>> 3683: >>> 3684: bool TypeInterfaces::has_non_array_interface() const { >> >> What about using `TypeAryPtr::_array_interfaces->contains(_interfaces);` instead? > > Almost! > > return !TypeAryPtr::_array_interfaces->contains(this); > > Contains is about TypeInterfaces, that is set of interfaces. So I just need to check that `this` is not a sub-set of array interfaces. That should do it. Now I'm confused, isn't this what I proposed? I didn't check the exact syntax, I just wondered if the `TypeInterfaces::contains` method couldn't be used instead of adding a new method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24245#discussion_r2019219027 From vlivanov at openjdk.org Fri Mar 28 20:15:21 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 28 Mar 2025 20:15:21 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v44] In-Reply-To: References: Message-ID: <2coxPhTGtiC8yWQMkX391-2RACf31T3EjXL8jow39u0=.2a6f7376-1d16-46d5-b730-2fb8931f596c@github.com> On Mon, 24 Mar 2025 18:29:34 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > Undo accidental changes to Int tests Overall, looks good. Very nice unit test! 2 comments: * naming: why not simply `xor_upper_bound` instead of `calc_xor_upper_bound_of_non_neg`? * ceremony around exposing `calc_xor_upper_bound_of_non_neg` to gtest: have you tried to move it under `src/share/hotspot/utilities/` and include from both `addnode.cpp` and `test_xor_node.cpp`. ------------- PR Review: https://git.openjdk.org/jdk/pull/23089#pullrequestreview-2726627039 From eastigeevich at openjdk.org Fri Mar 28 20:20:41 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 28 Mar 2025 20:20:41 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache [v3] In-Reply-To: References: Message-ID: > This benchmark is used to check performance impact of the code cache being sparse. > > We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. > > Results: code region size 2M (2097152) bytes > - Intel Xeon Platinum 8259CL > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |19.577 |0.619 |us/op | | > |128 |32 |4 |22.968 |0.314 |us/op |17.30% | > |128 |48 |3 |22.245 |0.388 |us/op |13.60% | > |128 |64 |2 |23.874 |0.84 |us/op |21.90% | > |128 |80 |2 |23.786 |0.231 |us/op |21.50% | > |128 |96 |1 |26.224 |1.16 |us/op |34% | > |128 |112 |1 |27.028 |0.461 |us/op |38.10% | > |256 |1 |256 |47.43 |1.146 |us/op | | > |256 |32 |8 |63.962 |1.671 |us/op |34.90% | > |256 |48 |5 |63.396 |0.247 |us/op |33.70% | > |256 |64 |4 |66.604 |2.286 |us/op |40.40% | > |256 |80 |3 |59.746 |1.273 |us/op |26% | > |256 |96 |3 |63.836 |1.034 |us/op |34.60% | > |256 |112 |2 |63.538 |1.814 |us/op |34% | > |512 |1 |512 |172.731 |4.409 |us/op | | > |512 |32 |16 |206.772 |6.229 |us/op |19.70% | > |512 |48 |11 |215.275 |2.228 |us/op |24.60% | > |512 |64 |8 |212.962 |2.028 |us/op |23.30% | > |512 |80 |6 |201.335 |12.519 |us/op |16.60% | > |512 |96 |5 |198.133 |6.502 |us/op |14.70% | > |512 |112 |5 |193.739 |3.812 |us/op |12.20% | > |768 |1 |768 |325.154 |5.048 |us/op | | > |768 |32 |24 |346.298 |20.196 |us/op |6.50% | > |768 |48 |16 |350.746 |2.931 |us/op |7.90% | > |768 |64 |12 |339.445 |7.927 |us/op |4.40% | > |768 |80 |10 |347.408 |7.355 |us/op |6.80% | > |768 |96 |8 |340.983 |3.578 |us/op |4.90% | > |768 |112 |7 |353.949 |2.98 |us/op |8.90% | > |1024 |1 |1024 |368.352 |5.961 |us/op | | > |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | > |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | > |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | > |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | > |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | > |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | > > - AArch64 Neoverse N1 > > |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | > |--- |--- |--- |--- |--- |--- |--- | > |128 |1 |128 |25.297 |0.792 |us/op | | > |128 |32 |4 |31.451... Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision: - Document assumptions about code placement in CodeCache - Address bulasevich comment: too many parameters values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23831/files - new: https://git.openjdk.org/jdk/pull/23831/files/ef7d9898..7de15419 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23831&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23831&range=01-02 Stats: 12 lines in 1 file changed: 10 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23831.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23831/head:pull/23831 PR: https://git.openjdk.org/jdk/pull/23831 From eastigeevich at openjdk.org Fri Mar 28 20:23:16 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 28 Mar 2025 20:23:16 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache [v2] In-Reply-To: <-Ebzc6t0saRWpkdRmuj-6kCbQSuYiN3pxnEzMheW2dE=.b5d34814-d9ab-4869-9b8a-e20f1a0ea58c@github.com> References: <-Ebzc6t0saRWpkdRmuj-6kCbQSuYiN3pxnEzMheW2dE=.b5d34814-d9ab-4869-9b8a-e20f1a0ea58c@github.com> Message-ID: On Tue, 25 Mar 2025 02:30:24 GMT, Boris Ulasevich wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Separate active methods and method calling them with 128Mb dummy space > > The results cover different AARCH Neoverse implementations. All three CPUs show performance degradation as sparsity increases (i.e., as groupCount grows). This seems to be a common feature of the Neoverse V2 architecture. Azure Cobalt also degrades more sharply as the number of active methods increases. > > This table, along with earlier measurements, highlights the significant impact of code sparsity across a wide range of platforms. Typically, calling distant methods - rather than those grouped closely together - results in a slowdown of approximately 20%. > > SparseCodeCache | ? | G4 | ? | Azure Cobalt | ? | Google Axion | ? > -- | -- | -- | -- | -- | -- | -- | -- > activeMethodCount | groupCount | us/op | ? | us/op | ? | us/op | ? > 128 | 1 | 11.972 | 0.004 | 11.092 | 0.007 | 11.201 | 0.059 > 128 | 32 | 13.622 | 0.092 | 15.808 | 0.779 | 11.928 | 0.013 > 128 | 48 | 13.217 | 0.072 | 15.937 | 0.498 | 12.126 | 0.009 > 128 | 64 | 13.668 | 0.04 | 16.137 | 0.517 | 12.171 | 0.139 > 128 | 80 | 13.986 | 0.127 | 17.681 | 0.262 | 12.525 | 0.033 > 128 | 96 | 14.594 | 0.055 | 18.25 | 0.795 | 12.979 | 0.051 > 128 | 112 | 14.77 | 0.078 | 18.529 | 1.004 | 13.129 | 0.049 > 256 | 1 | 23.998 | 0.019 | 22.417 | 0.006 | 22.409 | 0.003 > 256 | 32 | 26.273 | 0.036 | 33.329 | 0.949 | 25.097 | 0.043 > 256 | 48 | 26.61 | 0.063 | 34.566 | 0.343 | 24.771 | 0.118 > 256 | 64 | 26.959 | 0.085 | 35.953 | 0.456 | 24.443 | 0.028 > 256 | 80 | 27.646 | 0.089 | 38.569 | 4.495 | 25.245 | 0.027 > 256 | 96 | 27.829 | 0.128 | 37.749 | 0.991 | 25.536 | 0.031 > 256 | 112 | 28.298 | 0.064 | 40.261 | 0.155 | 25.787 | 0.016 > 512 | 1 | 48.181 | 0.032 | 68.768 | 0.537 | 44.863 | 0.004 > 512 | 32 | 53.157 | 0.044 | 94.262 | 2.801 | 50.037 | 0.038 > 512 | 48 | 55.13 | 0.052 | 106.928 | 3.513 | 54.611 | 0.044 > 512 | 64 | 56.609 | 0.123 | 103.403 | 0.708 | 53.906 | 0.039 > 512 | 80 | 57.146 | 0.091 | 112.929 | 2.522 | 52.923 | 0.081 > 512 | 96 | 59.038 | 0.092 | 141.291 | 2.346 | 56.018 | 0.054 > 512 | 112 | 60.647 | 0.331 | 137.491 | 11.441 | 56.705 | 0.117 > 768 | 1 | 77.086 | 0.402 | 138.572 | 2.444 | 68.464 | 0.056 > 768 | 32 | 89.599 | 0.14 | 159.353 | 4.639 | 94.478 | 1.129 > 768 | 48 | 94.312 | 0.33 | 177.518 | 1.728 | 99.704 | 0.131 > 768 | 64 | 94.243 | 0.218 | 182.263 | 2.634 | 90.027 | 0.19 > 768 | 80 | 95.566 | 0.068 | 185.748 | 32.128 | 96.61 | 0.157 > 768 | 96 | 99.435 | 0.323 | 195.603 | 13.653 | 102.222 | 0.027 > 768 | 112 | 105.814 | 0.366 | 216.653 | 1.694 | 103.918 | 0.497 > 1024 | 1 | 110.407 | 1.27 | 203.428 | 2.049 | 97.032 | 0.739 > 1024 | 32 | 137... @bulasevich, I reduced a number of values for the parameters and documented assumptions. @vnkozlov, could you please reapprove? No changes are made in the main code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2762364033 From shade at openjdk.org Fri Mar 28 20:42:05 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 28 Mar 2025 20:42:05 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal Message-ID: Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. Additional testing: - [ ] Linux x86_64 server fastdebug, `all` - [ ] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` ------------- Commit messages: - Touchup - Fix Changes: https://git.openjdk.org/jdk/pull/24300/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24300&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353192 Stats: 511 lines in 5 files changed: 1 ins; 439 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/24300.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24300/head:pull/24300 PR: https://git.openjdk.org/jdk/pull/24300 From sviswanathan at openjdk.org Fri Mar 28 21:20:18 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 28 Mar 2025 21:20:18 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v6] In-Reply-To: References: Message-ID: <4pVsbXILQQgsiSnldLRVf1fziUMF6PrqkEnr81RoFMg=.a79353fd-5dc2-4c64-8958-01cbc0557618@github.com> On Fri, 28 Mar 2025 04:50:07 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding custom NaN generator test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 58: > 56: if (Float16.isNaN(actual) && Float16.isNaN(expected)) { > 57: return false; > 58: } This should be: if (Float16.isNaN(actual) ^ Float16.isNaN(expected)) { return true; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2019389243 From sviswanathan at openjdk.org Fri Mar 28 21:20:19 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 28 Mar 2025 21:20:19 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v6] In-Reply-To: <4pVsbXILQQgsiSnldLRVf1fziUMF6PrqkEnr81RoFMg=.a79353fd-5dc2-4c64-8958-01cbc0557618@github.com> References: <4pVsbXILQQgsiSnldLRVf1fziUMF6PrqkEnr81RoFMg=.a79353fd-5dc2-4c64-8958-01cbc0557618@github.com> Message-ID: On Fri, 28 Mar 2025 21:13:03 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding custom NaN generator > > test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 58: > >> 56: if (Float16.isNaN(actual) && Float16.isNaN(expected)) { >> 57: return false; >> 58: } > > This should be: > if (Float16.isNaN(actual) ^ Float16.isNaN(expected)) { > return true; > } Basically assert if one is NaN and other is not. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2019397591 From vlivanov at openjdk.org Fri Mar 28 21:49:25 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 28 Mar 2025 21:49:25 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v2] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 26 Mar 2025 08:33:58 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Use builtin_throw > - Merge branch 'master' into fix/Deoptimization-and-re-compilation-cycle-with-C2-compiled-code > - More exhaustive bench > - Limit inlining of math Exact operations in case of too many deopts Thanks, Marc. It looks a bit too convoluted to me. IMO an unconditional call to `builtin_throw`, plus `too_many_traps` check should do the job. Do I miss something important here? src/hotspot/share/opto/graphKit.hpp line 279: > 277: // The JVMS must allow the bytecode to be re-executed via an uncommon trap. > 278: // If `exception_object` is nullptr, the exception to throw will be guessed based on `reason` > 279: void builtin_throw(Deoptimization::DeoptReason reason, ciInstance* exception_object = nullptr); Please, introduce a new overload instead. I suggest to extract Deoptimization::DeoptReason -> ciInstance mapping into a helper method and turn `void builtin_throw(Deoptimization::DeoptReason reason)` into a wrapper: void GraphKit::builtin_throw(Deoptimization::DeoptReason reason) { builtin_throw(reason, exception_on_deopt(reason)); } src/hotspot/share/opto/library_call.cpp line 2035: > 2033: > 2034: if (use_builtin_throw) { > 2035: builtin_throw(Deoptimization::Reason_intrinsic, env()->ArithmeticException_instance()); I suggest to unconditionally call `builtin_throw()`. It should handle `uncommon_trap` case as well. What makes sense is to ensure that `builtin_throw()` doesn't change deoptimization reason. It can be implemented with an extra argument to new `GraphKit::builtin_throw` overload (e.g., `bool allow_deopt_reason_none`). src/hotspot/share/opto/library_call.cpp line 2054: > 2052: // instead of bailing out on intrinsic or potentially deopting, let's do that! > 2053: use_builtin_throw = true; > 2054: } else if (too_many_traps(Deoptimization::Reason_intrinsic)) { Why `too_many_traps(Deoptimization::Reason_intrinsic)` check is not enough here? ------------- PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2726864135 PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2019432922 PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2019444895 PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2019449996 From vlivanov at openjdk.org Fri Mar 28 21:51:09 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 28 Mar 2025 21:51:09 GMT Subject: RFR: 8351233: [ASAN] avx2-emu-funcs.hpp:151:20: error: =?UTF-8?B?4oCYRC44MjE4OOKAmQ==?= is used uninitialized In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 03:35:20 GMT, SendaoYan wrote: > Hi all, > > The return type of function `const __m256i &perm` is `__m256i`, so `const __m256i &perm` should be replaced as 'const __m256i perm'. > > The function implementation in gcc/clang compiler header: > > 1. gcc: lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h > > > extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm256_loadu_si256 (__m256i_u const *__P) > { > return *__P; > } > > > 2. clang: lib64/clang/17/include/avxintrin.h > > > static __inline __m256i __DEFAULT_FN_ATTRS > _mm256_loadu_si256(__m256i_u const *__p) > { > struct __loadu_si256 { > __m256i_u __v; > } __attribute__((__packed__, __may_alias__)); > return ((const struct __loadu_si256*)__p)->__v; > } > > > Additional testing: > > - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64(AMD EPYC 9T24 96-Core Processor) with release build > - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64(AMD EPYC 9T24 96-Core Processor) with fastdebug build Looks good. (I'll submit it for testing.) ------------- PR Review: https://git.openjdk.org/jdk/pull/23925#pullrequestreview-2726902632 From vlivanov at openjdk.org Fri Mar 28 22:04:20 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 28 Mar 2025 22:04:20 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks In-Reply-To: References: Message-ID: <4ZbIg2yTtJjQUwkCjO_Klnv0e4_DLNaRzxxpJa4g9RU=.9f32f9f2-b50c-495d-8188-3207a061e7b3@github.com> On Wed, 12 Mar 2025 19:45:41 GMT, Aleksey Shipilev wrote: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Good catch, Aleksey! What do you think about making 1 step further and encapsulating weak/strong reference handling into a helper class? Also, as an optimization idea: seems like weak + strong handles form a union (none -> weak -> strong). So, once a strong reference is captured, corresponding weak handle can be cleared straight away. ------------- PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2726922896 From sviswanathan at openjdk.org Fri Mar 28 22:17:22 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 28 Mar 2025 22:17:22 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v6] In-Reply-To: References: <4pVsbXILQQgsiSnldLRVf1fziUMF6PrqkEnr81RoFMg=.a79353fd-5dc2-4c64-8958-01cbc0557618@github.com> Message-ID: On Fri, 28 Mar 2025 21:17:21 GMT, Sandhya Viswanathan wrote: >> test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 58: >> >>> 56: if (Float16.isNaN(actual) && Float16.isNaN(expected)) { >>> 57: return false; >>> 58: } >> >> This should be: >> if (Float16.isNaN(actual) ^ Float16.isNaN(expected)) { >> return true; >> } > > Basically assert if one is NaN and other is not. On further thought what you have also works. Though we could simplify the assertionCheck method to just one statement: public static boolean assertionCheck(Float16 actual, Float16 expected) { return !actual.equals(expected); } This is because, the equals method takes care of NaNs. The [equals](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/Double.html#equals(java.lang.Object)) uses [representation equivalence](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/Double.html#repEquivalence), defining NaN arguments to be equal to each other. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2019488902 From vlivanov at openjdk.org Fri Mar 28 22:25:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 28 Mar 2025 22:25:29 GMT Subject: RFR: 8352316: More MergeStoreBench In-Reply-To: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: On Wed, 19 Mar 2025 03:28:59 GMT, Shaojin Wen wrote: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works On naming: * `null` usages confuse me (`NULL_STR` et al, `putNull`). Why "null" is special? Can you just use an arbitrary 4-byte string? * PR proposes a mix of snake & camel case while the code around uses camel case. Worth considering grouping similar benchmarks into an inner class. ------------- PR Review: https://git.openjdk.org/jdk/pull/24108#pullrequestreview-2726955147 From swen at openjdk.org Sat Mar 29 00:47:09 2025 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 29 Mar 2025 00:47:09 GMT Subject: RFR: 8352316: More MergeStoreBench In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: <9Q7l7Ki84PTChfZ7fG7BWz4NiJnZxR2h1Blqgd1LPLY=.561a90ed-45fb-4d13-9f66-3956a10e5d00@github.com> On Thu, 27 Mar 2025 05:17:44 GMT, Shaojin Wen wrote: >> I'm a developer of fastjson2. According to third-party benchmarks from https://github.com/fabienrenaud/java-json-benchmark, our library demonstrates the best performance. I would like to contribute some of these optimization techniques to OpenJDK, ideally by having C2 (the JIT compiler) directly support them. >> >> Below is an example related to this PR. We have a JavaBean that needs to be serialized to a JSON string: >> >> >> * JavaBean >> >> class Bean { >> public int value; >> } >> >> >> * Target JSON Output >> >> {"value":123} >> >> >> * CodeGen-Generated JSONSerializer >> fastjson2 uses ASM to generate a serializer class like the following. The methods writeNameValue0, writeNameValue1, and writeNameValue2 are candidate implementations. Among them, writeNameValue2 is the fastest when the field name length is 8, as it leverages UNSAFE.putLong for direct memory operations: >> >> >> class BeanJSONSerializer { >> private static final String name = ""value":"; >> private static final byte[] nameBytes = name.getBytes(); >> private satic final long nameLong = UNSAFE.getLong(nameBytes, ARRAY_BYTE_BASE_OFFSET); >> >> int writeNameValue0(byte[] bytes, int off, int value) { >> name.getBytes(0, 8, bytes, off); >> off += 8; >> return writeInt32(bytes, off, value); >> } >> >> int writeNameValue1(byte[] bytes, int off, int value) { >> System.arraycopy(nameBytes, 0, bytes, off, 8); >> off += 8; >> return writeInt32(bytes, off, value); >> } >> >> >> int writeNameValue2(byte[] bytes, int off, int value) { >> UNSAFE.putLong(bytes, ARRAY_BYTE_BASE_OFFSET + off, nameLong); >> off += 8; >> return writeInt32(bytes, off, value); >> } >> } >> >> >> We propose that the C2 compiler could optimize cases where the field name length is 4 or 8 bytes by automatically using direct memory operations similar to writeNameValue2. This would eliminate the need for manual unsafe operations in user code and improve serialization performance for common patterns. > >> @wenshao Do you have any insight from this benchmark? What was your motivation for it? >> >> I also wonder if an IR test for some of the cases would be helpful. IR tests give us more info about what the compiler produced, and if there is a change in VM behaviour the IR test catches it in regular testing. Benchmarks are not run regularly, and regressions would therefore not be caught. > > I submitted this benchmark to prove that the performance of System.arraycopy or String.getBytes can be improved by Unsafe.putInt/putLong. I hope C2 can do this optimization automatically. > @wenshao > > > I hope C2 can do this optimization automatically. > > Did you check if it does or does not do that? Can you investigate what the generated code is for `String.getBytes`? Does that not create an allocation, which would make things much slower? And it may even do some more complicated encoding things, which is a lot of overhead. So that would explain your performance result, at least partially, right? > > I'm also not convinced that you are comparing apples to apples here. > > ``` > Benchmark Mode Cnt Score Error Units > MergeStoreBench.putNull_arraycopy avgt 5 8029.622 ? 60.856 ns/op > ``` > > This does an array copy, so an array load AND an array store, right? > > This one even has to do allocations, loads and stores (though you need to investigate and check): > > ``` > MergeStoreBench.putNull_getBytes avgt 5 6171.538 ? 5.845 ns/op > ``` > > On the other hand, this does NOT have to do an array load or allocations, just a simple store: > > ``` > MergeStoreBench.putNull_unsafePutInt avgt 5 235.302 ? 2.004 ns/op > ``` > > Is there actually a benchmark in this series that makes use of individual byte stores that get merged to an int store? Because that is the whole point of MergeStores, right? > > Do you really need to use `String.getBytes`? I mean maybe with proper escape analysis etc the whole allocation could be avoided. But that would require a much deeper analysis. > > Back to this: > > > I hope C2 can do this optimization automatically. > > Can you investigate what code it generates, and what kinds of optimizations are missing to make it close in performance to the `Unsafe` benchmark? > > I don't have time to do all the deep investigations myself. But feel free to ask me if you have more questions. By default, in OpenJDK, COMPACT_STRINGS = true, and the String coder without UTF16 characters is LATIN1, which is implemented using System.arraycopy. However, since String is immutable and System.arraycopy is directly performed on byte[], C2 should have more opportunities for optimization. class String { @Stable private final byte[] value; private final byte coder; boolean isLatin1() { return COMPACT_STRINGS && coder == LATIN1; } public void getBytes(int srcBegin, int srcEnd, byte[] dst, int dstBegin) { checkBoundsBeginEnd(srcBegin, srcEnd, length()); Objects.requireNonNull(dst); checkBoundsOffCount(dstBegin, srcEnd - srcBegin, dst.length); if (isLatin1()) { StringLatin1.getBytes(value, srcBegin, srcEnd, dst, dstBegin); } else { StringUTF16.getBytes(value, srcBegin, srcEnd, dst, dstBegin); } } } class StringLatin1 { public static void getBytes(byte[] value, int srcBegin, int srcEnd, byte[] dst, int dstBegin) { System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin); } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2762946069 From swen at openjdk.org Sat Mar 29 01:13:30 2025 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 29 Mar 2025 01:13:30 GMT Subject: RFR: 8352316: More MergeStoreBench In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: On Fri, 28 Mar 2025 22:22:33 GMT, Vladimir Ivanov wrote: > On naming: > > * `null` usages confuse me (`NULL_STR` et al, `putNull`). Why "null" is special? Can you just use an arbitrary 4-byte string? "null" is very common, here, because its length is 4. When coder = LATIN1, the length of byte[] value is 4, and when coder = UTF16, the length of byte[] value is 8, which is easy to compare with Unsafe.putInt/putLong. If the string is not a multiple of 4, we can also use a combination. For example, when the length is 5, we can use the putInt + putByte combination. String str = "a1234"; str.getBytes(bytes, 0, 5, bytes, off); UNSAFE.putInt(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + off, 0x33323161); // 0x33323161 is "a123" USNAFE.putByte(bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET + off + 4, '4'); > * PR proposes a mix of snake & camel case while the code around uses camel case. Worth considering grouping similar benchmarks into an inner class. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2762963689 From vlivanov at openjdk.org Sat Mar 29 01:35:15 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 29 Mar 2025 01:35:15 GMT Subject: RFR: 8351233: [ASAN] avx2-emu-funcs.hpp:151:20: error: =?UTF-8?B?4oCYRC44MjE4OOKAmQ==?= is used uninitialized In-Reply-To: References: Message-ID: <7o6aKz9BB_Xy5fyAHMvURof9Zp1p5qIbzikO3d0arlc=.5966f24a-b244-4b88-936b-4d3d99953375@github.com> On Thu, 6 Mar 2025 03:35:20 GMT, SendaoYan wrote: > Hi all, > > The return type of function `const __m256i &perm` is `__m256i`, so `const __m256i &perm` should be replaced as 'const __m256i perm'. > > The function implementation in gcc/clang compiler header: > > 1. gcc: lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h > > > extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm256_loadu_si256 (__m256i_u const *__P) > { > return *__P; > } > > > 2. clang: lib64/clang/17/include/avxintrin.h > > > static __inline __m256i __DEFAULT_FN_ATTRS > _mm256_loadu_si256(__m256i_u const *__p) > { > struct __loadu_si256 { > __m256i_u __v; > } __attribute__((__packed__, __may_alias__)); > return ((const struct __loadu_si256*)__p)->__v; > } > > > Additional testing: > > - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64(AMD EPYC 9T24 96-Core Processor) with release build > - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64(AMD EPYC 9T24 96-Core Processor) with fastdebug build Testing results (hs-tier1 - hs-tier4) are clean. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23925#pullrequestreview-2727171483 From vlivanov at openjdk.org Sat Mar 29 01:33:16 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 29 Mar 2025 01:33:16 GMT Subject: RFR: 8352316: More MergeStoreBench In-Reply-To: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: On Wed, 19 Mar 2025 03:28:59 GMT, Shaojin Wen wrote: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works Ok, I don't have anything against a fixed string constant. But existing names (`NULL_STR` et al, `putNull`) add confusion IMO (especially, when there's Unsafe in play). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2762978817 From fyang at openjdk.org Sat Mar 29 02:07:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 29 Mar 2025 02:07:51 GMT Subject: RFR: 8353219: RISC-V: Fix client builds after JDK-8345298 Message-ID: Hi, please review this trivial change fixing a client build issue. The definitions of both `generate_float16ToFloat()` and `generate_floatToFloat16()` should be moved out of `COMPILER2_OR_JVMCI` macro scope. Testing: client builds fine on linux-riscv64 with this change. ------------- Commit messages: - 8353219: RISC-V: Fix client builds after JDK-8345298 Changes: https://git.openjdk.org/jdk/pull/24307/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24307&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353219 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24307/head:pull/24307 PR: https://git.openjdk.org/jdk/pull/24307 From swen at openjdk.org Sat Mar 29 02:51:49 2025 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 29 Mar 2025 02:51:49 GMT Subject: RFR: 8352316: More MergeStoreBench [v2] In-Reply-To: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: fix naming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24108/files - new: https://git.openjdk.org/jdk/pull/24108/files/23dba8c5..48d1f3af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=00-01 Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/24108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24108/head:pull/24108 PR: https://git.openjdk.org/jdk/pull/24108 From swen at openjdk.org Sat Mar 29 03:15:31 2025 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 29 Mar 2025 03:15:31 GMT Subject: RFR: 8352316: More MergeStoreBench [v3] In-Reply-To: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works Shaojin Wen has updated the pull request incrementally with two additional commits since the last revision: - bug fix for str5Utf16ArrayCopy - bug fix & fix comments & add str5 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24108/files - new: https://git.openjdk.org/jdk/pull/24108/files/48d1f3af..4ab0c1ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=01-02 Stats: 110 lines in 1 file changed: 105 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24108/head:pull/24108 PR: https://git.openjdk.org/jdk/pull/24108 From fjiang at openjdk.org Sat Mar 29 03:19:07 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 29 Mar 2025 03:19:07 GMT Subject: RFR: 8353219: RISC-V: Fix client builds after JDK-8345298 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 02:01:17 GMT, Fei Yang wrote: > Hi, please review this trivial change fixing a client build issue. > The definitions of both `generate_float16ToFloat()` and `generate_floatToFloat16()` should be moved out of `COMPILER2_OR_JVMCI` macro scope. Testing: client builds fine on linux-riscv64 with this change. Marked as reviewed by fjiang (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24307#pullrequestreview-2727230943 From swen at openjdk.org Sat Mar 29 04:19:19 2025 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 29 Mar 2025 04:19:19 GMT Subject: RFR: 8352316: More MergeStoreBench [v4] In-Reply-To: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: bug fix & add str benchmark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24108/files - new: https://git.openjdk.org/jdk/pull/24108/files/4ab0c1ed..f1fb0dfc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=02-03 Stats: 158 lines in 1 file changed: 136 ins; 1 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/24108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24108/head:pull/24108 PR: https://git.openjdk.org/jdk/pull/24108 From swen at openjdk.org Sat Mar 29 04:59:24 2025 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 29 Mar 2025 04:59:24 GMT Subject: RFR: 8352316: More MergeStoreBench [v5] In-Reply-To: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: disable auto vector & more benchmark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24108/files - new: https://git.openjdk.org/jdk/pull/24108/files/f1fb0dfc..a5eb3b98 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=03-04 Stats: 155 lines in 1 file changed: 121 ins; 0 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/24108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24108/head:pull/24108 PR: https://git.openjdk.org/jdk/pull/24108 From swen at openjdk.org Sat Mar 29 05:21:30 2025 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 29 Mar 2025 05:21:30 GMT Subject: RFR: 8352316: More MergeStoreBench In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: On Sat, 29 Mar 2025 01:30:01 GMT, Vladimir Ivanov wrote: >> Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works > > Ok, I don't have anything against a fixed string constant. But existing names (`NULL_STR` et al, `putNull`) add confusion IMO (especially, when there's Unsafe in play). According to @iwanowww's suggestion, I changed the original name of putNull to str4, and added the benchmarks of str5 and str7. The following are the new performance numbers, which show that using ArraySetConst or UnsafePut has better performance. # 1. Scipt git remote add wenshao git at github.com:wenshao/jdk.git git fetch wenshao git checkout a5eb3b98ece8cf1aa6eaa3d1287148e1b0510f4b make test TEST="micro:vm.compiler.MergeStoreBench.str" # 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC? Genoa) Benchmark Mode Cnt Score Error Units MergeStoreBench.str4ArraySetConst avgt 5 1343.148 ? 3.995 ns/op MergeStoreBench.str4Arraycopy avgt 5 7293.298 ? 35.868 ns/op MergeStoreBench.str4GetBytes avgt 5 6175.505 ? 17.465 ns/op MergeStoreBench.str4GetChars avgt 5 13954.105 ? 1561.080 ns/op MergeStoreBench.str4StringBuilder avgt 5 15633.944 ? 4579.011 ns/op MergeStoreBench.str4UnsafePut avgt 5 1325.916 ? 6.126 ns/op MergeStoreBench.str4Utf16ArrayCopy avgt 5 13998.302 ? 938.311 ns/op MergeStoreBench.str4Utf16ArraySetConst avgt 5 1514.040 ? 6.774 ns/op MergeStoreBench.str4Utf16StringBuilder avgt 5 16382.059 ? 4943.649 ns/op MergeStoreBench.str4Utf16UnsafePut avgt 5 1616.452 ? 9.472 ns/op MergeStoreBench.str5ArraySetConst avgt 5 2609.046 ? 28.409 ns/op MergeStoreBench.str5Arraycopy avgt 5 9519.887 ? 54.364 ns/op MergeStoreBench.str5GetBytes avgt 5 5987.410 ? 14.277 ns/op MergeStoreBench.str5GetChars avgt 5 13598.285 ? 241.078 ns/op MergeStoreBench.str5StringBuilder avgt 5 16556.510 ? 2962.211 ns/op MergeStoreBench.str5UnsafePut avgt 5 2431.841 ? 24.299 ns/op MergeStoreBench.str5Utf16ArrayCopy avgt 5 21433.158 ? 131.466 ns/op MergeStoreBench.str5Utf16ArraySetConst avgt 5 2935.785 ? 3.777 ns/op MergeStoreBench.str5Utf16StringBuilder avgt 5 18746.936 ? 3680.162 ns/op MergeStoreBench.str5Utf16UnsafePut avgt 5 2878.038 ? 10.055 ns/op MergeStoreBench.str7ArraySetConst avgt 5 3594.628 ? 24.397 ns/op MergeStoreBench.str7Arraycopy avgt 5 12314.423 ? 81.095 ns/op MergeStoreBench.str7GetBytes avgt 5 9014.943 ? 222.911 ns/op MergeStoreBench.str7GetChars avgt 5 16866.491 ? 178.543 ns/op MergeStoreBench.str7StringBuilder avgt 5 25238.440 ? 2757.460 ns/op MergeStoreBench.str7UnsafePut avgt 5 3597.008 ? 26.531 ns/op MergeStoreBench.str7Utf16ArrayCopy avgt 5 21325.797 ? 111.975 ns/op MergeStoreBench.str7Utf16ArraySetConst avgt 5 3934.164 ? 97.003 ns/op MergeStoreBench.str7Utf16StringBuilder avgt 5 19315.320 ? 1960.379 ns/op MergeStoreBench.str7Utf16UnsafePut avgt 5 4190.362 ? 8.042 ns/op # 3. aliyun_ecs_c8i_x64 (CPU Intel?Xeon?Emerald Rapids) Benchmark Mode Cnt Score Error Units MergeStoreBench.str4ArraySetConst avgt 5 1558.348 ? 0.959 ns/op MergeStoreBench.str4Arraycopy avgt 5 5837.069 ? 3.166 ns/op MergeStoreBench.str4GetBytes avgt 5 5875.195 ? 12.562 ns/op MergeStoreBench.str4GetChars avgt 5 12679.307 ? 62.069 ns/op MergeStoreBench.str4StringBuilder avgt 5 16588.064 ? 75.515 ns/op MergeStoreBench.str4UnsafePut avgt 5 1543.947 ? 4.780 ns/op MergeStoreBench.str4Utf16ArrayCopy avgt 5 13973.910 ? 329.196 ns/op MergeStoreBench.str4Utf16ArraySetConst avgt 5 2591.923 ? 6.758 ns/op MergeStoreBench.str4Utf16StringBuilder avgt 5 17719.390 ? 5016.367 ns/op MergeStoreBench.str4Utf16UnsafePut avgt 5 2539.849 ? 8.091 ns/op MergeStoreBench.str5ArraySetConst avgt 5 3004.459 ? 9.575 ns/op MergeStoreBench.str5Arraycopy avgt 5 7153.397 ? 52.069 ns/op MergeStoreBench.str5GetBytes avgt 5 5566.344 ? 4.400 ns/op MergeStoreBench.str5GetChars avgt 5 14444.069 ? 224.157 ns/op MergeStoreBench.str5StringBuilder avgt 5 18371.573 ? 293.271 ns/op MergeStoreBench.str5UnsafePut avgt 5 2879.242 ? 9.412 ns/op MergeStoreBench.str5Utf16ArrayCopy avgt 5 4548.225 ? 14.172 ns/op MergeStoreBench.str5Utf16ArraySetConst avgt 5 3864.536 ? 4.208 ns/op MergeStoreBench.str5Utf16StringBuilder avgt 5 20413.600 ? 1513.652 ns/op MergeStoreBench.str5Utf16UnsafePut avgt 5 3858.928 ? 2.923 ns/op MergeStoreBench.str7ArraySetConst avgt 5 4658.730 ? 4.558 ns/op MergeStoreBench.str7Arraycopy avgt 5 12130.150 ? 13.268 ns/op MergeStoreBench.str7GetBytes avgt 5 11941.311 ? 201.509 ns/op MergeStoreBench.str7GetChars avgt 5 21081.423 ? 1892.526 ns/op MergeStoreBench.str7StringBuilder avgt 5 14661.312 ? 768.749 ns/op MergeStoreBench.str7UnsafePut avgt 5 4662.649 ? 2.974 ns/op MergeStoreBench.str7Utf16ArrayCopy avgt 5 4973.827 ? 2.841 ns/op MergeStoreBench.str7Utf16ArraySetConst avgt 5 5407.768 ? 19.989 ns/op MergeStoreBench.str7Utf16StringBuilder avgt 5 25378.418 ? 9377.505 ns/op MergeStoreBench.str7Utf16UnsafePut avgt 5 5494.466 ? 5.377 ns/op # 4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710) Benchmark Mode Cnt Score Error Units MergeStoreBench.str4ArraySetConst avgt 5 2232.675 ? 0.858 ns/op MergeStoreBench.str4Arraycopy avgt 5 8342.762 ? 22.772 ns/op MergeStoreBench.str4GetBytes avgt 5 6988.049 ? 11.874 ns/op MergeStoreBench.str4GetChars avgt 5 12363.100 ? 30.414 ns/op MergeStoreBench.str4StringBuilder avgt 5 21257.805 ? 1371.310 ns/op MergeStoreBench.str4UnsafePut avgt 5 2234.198 ? 1.698 ns/op MergeStoreBench.str4Utf16ArrayCopy avgt 5 16381.011 ? 102.719 ns/op MergeStoreBench.str4Utf16ArraySetConst avgt 5 3109.010 ? 8.955 ns/op MergeStoreBench.str4Utf16StringBuilder avgt 5 22010.040 ? 908.358 ns/op MergeStoreBench.str4Utf16UnsafePut avgt 5 2868.544 ? 12.469 ns/op MergeStoreBench.str5ArraySetConst avgt 5 3780.322 ? 5.041 ns/op MergeStoreBench.str5Arraycopy avgt 5 10649.712 ? 39.440 ns/op MergeStoreBench.str5GetBytes avgt 5 6612.562 ? 7.260 ns/op MergeStoreBench.str5GetChars avgt 5 15521.451 ? 157.817 ns/op MergeStoreBench.str5StringBuilder avgt 5 22938.577 ? 1814.071 ns/op MergeStoreBench.str5UnsafePut avgt 5 3769.850 ? 0.524 ns/op MergeStoreBench.str5Utf16ArrayCopy avgt 5 5832.413 ? 5.256 ns/op MergeStoreBench.str5Utf16ArraySetConst avgt 5 4644.579 ? 41.694 ns/op MergeStoreBench.str5Utf16StringBuilder avgt 5 26369.411 ? 8050.710 ns/op MergeStoreBench.str5Utf16UnsafePut avgt 5 4497.980 ? 42.817 ns/op MergeStoreBench.str7ArraySetConst avgt 5 5913.136 ? 12.055 ns/op MergeStoreBench.str7Arraycopy avgt 5 14427.669 ? 80.229 ns/op MergeStoreBench.str7GetBytes avgt 5 11712.364 ? 13.206 ns/op MergeStoreBench.str7GetChars avgt 5 21309.046 ? 519.416 ns/op MergeStoreBench.str7StringBuilder avgt 5 18882.777 ? 2659.525 ns/op MergeStoreBench.str7UnsafePut avgt 5 5926.995 ? 11.841 ns/op MergeStoreBench.str7Utf16ArrayCopy avgt 5 6362.405 ? 5.381 ns/op MergeStoreBench.str7Utf16ArraySetConst avgt 5 4339.133 ? 2.066 ns/op MergeStoreBench.str7Utf16StringBuilder avgt 5 30761.366 ? 13408.497 ns/op MergeStoreBench.str7Utf16UnsafePut avgt 5 6345.575 ? 128.697 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2763108910 From duke at openjdk.org Sat Mar 29 05:31:14 2025 From: duke at openjdk.org (duke) Date: Sat, 29 Mar 2025 05:31:14 GMT Subject: RFR: 8317976: Optimize SIMD sort for AMD Zen 4 [v3] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 06:37:54 GMT, Rohit Arul Raj wrote: >> In JDK-8309130, Array sort was optimized using AVX512 SIMD instructions for x86_64. Currently, this optimization has been disabled for AMD Zen 4 [JDK-8317763] due to bad performance of compressstoreu. >> Ref: https://www.reddit.com/r/java/comments/171t5sj/heads_up_openjdk_implementation_of_avx512_based/. >> >> This patch enables Zen 4 to pick optimized AVX2 version of SIMD sort and Zen 5 picks the AVX512 version. >> >> JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. >> >> Attaching ArraySort performance data for Zen4 & Zen5. >> [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) >> [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) > > Rohit Arul Raj has updated the pull request incrementally with one additional commit since the last revision: > > Refactor 'supports_avx512_simd_sort' code to make it easily readable @rohitarulraj Your change (at version b369de6f9b0327a2090f9ea44f11ff2940e7095a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24053#issuecomment-2763111988 From swen at openjdk.org Sat Mar 29 05:56:47 2025 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 29 Mar 2025 05:56:47 GMT Subject: RFR: 8352316: More MergeStoreBench [v6] In-Reply-To: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works Shaojin Wen has updated the pull request incrementally with two additional commits since the last revision: - appendChar - bug fix for str5ArraySetConst ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24108/files - new: https://git.openjdk.org/jdk/pull/24108/files/a5eb3b98..322624eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=04-05 Stats: 109 lines in 1 file changed: 104 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24108/head:pull/24108 PR: https://git.openjdk.org/jdk/pull/24108 From duke at openjdk.org Sat Mar 29 07:19:21 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 29 Mar 2025 07:19:21 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v5] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request incrementally with two additional commits since the last revision: - Fix build - Fix test failed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/f6b2fbec..a1924c35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From swen at openjdk.org Sat Mar 29 07:27:24 2025 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 29 Mar 2025 07:27:24 GMT Subject: RFR: 8352316: More MergeStoreBench [v7] In-Reply-To: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: add StringBuilderUnsafePut ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24108/files - new: https://git.openjdk.org/jdk/pull/24108/files/322624eb..cd1d8fb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24108&range=05-06 Stats: 245 lines in 1 file changed: 211 ins; 0 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/24108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24108/head:pull/24108 PR: https://git.openjdk.org/jdk/pull/24108 From swen at openjdk.org Sat Mar 29 07:47:08 2025 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 29 Mar 2025 07:47:08 GMT Subject: RFR: 8352316: More MergeStoreBench [v7] In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: <-xsgQ8uhc8vksHhI4Elu3SwNqy8GEQdzCdB3SAsPQa0=.9ef939ee-6359-40cb-8663-dabaad6611b6@github.com> On Sat, 29 Mar 2025 07:27:24 GMT, Shaojin Wen wrote: >> Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works > > Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: > > add StringBuilderUnsafePut I added a new scenario `StringBuilderUnsafePut`, using Unsafe to modify StringBuilder directly to implement append constants. The performance numbers below show that ArraySetConst/StringBuilderUnsafePut/UnsafePut have better performance. These numbers show that Stable Value's arraycopy has great performance optimization potential, which is worth more optimization for C2. # 1. Scipt git remote add wenshao git at github.com:wenshao/jdk.git git fetch wenshao git checkout cd1d8fb3b137a741446c894d1893e7180535ce8f make test TEST="micro:vm.compiler.MergeStoreBench.str" # 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC? Genoa) Benchmark Mode Cnt Score Error Units MergeStoreBench.str4ArraySetConst avgt 5 1338.414 ? 3.209 ns/op MergeStoreBench.str4Arraycopy avgt 5 7271.203 ? 19.400 ns/op MergeStoreBench.str4GetBytes avgt 5 6154.684 ? 9.910 ns/op MergeStoreBench.str4GetChars avgt 5 14078.790 ? 59.175 ns/op MergeStoreBench.str4StringBuilder avgt 5 15766.528 ? 4634.119 ns/op MergeStoreBench.str4StringBuilderAppendChar avgt 5 41388.364 ? 9871.409 ns/op MergeStoreBench.str4StringBuilderUnsafePut avgt 5 1575.792 ? 4.102 ns/op MergeStoreBench.str4UnsafePut avgt 5 1326.499 ? 2.400 ns/op MergeStoreBench.str4Utf16ArrayCopy avgt 5 13949.307 ? 1045.255 ns/op MergeStoreBench.str4Utf16ArraySetConst avgt 5 1511.967 ? 5.250 ns/op MergeStoreBench.str4Utf16StringBuilder avgt 5 18030.261 ? 1656.463 ns/op MergeStoreBench.str4Utf16StringBuilderAppendChar avgt 5 35047.855 ? 16674.635 ns/op MergeStoreBench.str4Utf16StringBuilderUnsafePut avgt 5 2785.792 ? 5.571 ns/op MergeStoreBench.str4Utf16UnsafePut avgt 5 1613.812 ? 1.249 ns/op MergeStoreBench.str5ArraySetConst avgt 5 2599.310 ? 8.667 ns/op MergeStoreBench.str5Arraycopy avgt 5 9487.926 ? 29.234 ns/op MergeStoreBench.str5GetBytes avgt 5 5972.453 ? 16.035 ns/op MergeStoreBench.str5GetChars avgt 5 13516.943 ? 10.978 ns/op MergeStoreBench.str5StringBuilder avgt 5 16539.070 ? 3097.339 ns/op MergeStoreBench.str5StringBuilderAppendChar avgt 5 50506.770 ? 11536.414 ns/op MergeStoreBench.str5StringBuilderUnsafePut avgt 5 2653.493 ? 7.397 ns/op MergeStoreBench.str5UnsafePut avgt 5 2431.003 ? 10.690 ns/op MergeStoreBench.str5Utf16ArrayCopy avgt 5 20949.585 ? 1128.737 ns/op MergeStoreBench.str5Utf16ArraySetConst avgt 5 2933.045 ? 5.864 ns/op MergeStoreBench.str5Utf16StringBuilder avgt 5 21769.670 ? 4910.378 ns/op MergeStoreBench.str5Utf16StringBuilderAppendChar avgt 5 47491.137 ? 15262.349 ns/op MergeStoreBench.str5Utf16StringBuilderUnsafePut avgt 5 2652.690 ? 5.348 ns/op MergeStoreBench.str5Utf16UnsafePut avgt 5 2871.860 ? 5.845 ns/op MergeStoreBench.str7ArraySetConst avgt 5 3583.059 ? 22.359 ns/op MergeStoreBench.str7Arraycopy avgt 5 12289.685 ? 14.769 ns/op MergeStoreBench.str7GetBytes avgt 5 8968.316 ? 34.194 ns/op MergeStoreBench.str7GetChars avgt 5 16792.196 ? 72.787 ns/op MergeStoreBench.str7StringBuilder avgt 5 25231.342 ? 2851.998 ns/op MergeStoreBench.str7StringBuilderAppendChar avgt 5 67351.162 ? 51.074 ns/op MergeStoreBench.str7StringBuilderUnsafePut avgt 5 3397.856 ? 7.576 ns/op MergeStoreBench.str7UnsafePut avgt 5 3578.465 ? 3.344 ns/op MergeStoreBench.str7Utf16ArrayCopy avgt 5 21314.607 ? 117.545 ns/op MergeStoreBench.str7Utf16ArraySetConst avgt 5 3915.540 ? 7.042 ns/op MergeStoreBench.str7Utf16StringBuilder avgt 5 21113.390 ? 1452.353 ns/op MergeStoreBench.str7Utf16StringBuilderAppendChar avgt 5 79597.044 ? 176.197 ns/op MergeStoreBench.str7Utf16StringBuilderUnsafePut avgt 5 6413.179 ? 11.302 ns/op MergeStoreBench.str7Utf16UnsafePut avgt 5 4180.867 ? 7.475 ns/op # 3. aliyun_ecs_c8i_x64 (CPU Intel?Xeon?Emerald Rapids) Benchmark Mode Cnt Score Error Units MergeStoreBench.str4ArraySetConst avgt 5 1558.502 ? 2.989 ns/op MergeStoreBench.str4Arraycopy avgt 5 5855.148 ? 10.116 ns/op MergeStoreBench.str4GetBytes avgt 5 5874.873 ? 3.767 ns/op MergeStoreBench.str4GetChars avgt 5 12674.479 ? 103.618 ns/op MergeStoreBench.str4StringBuilder avgt 5 16564.323 ? 229.666 ns/op MergeStoreBench.str4StringBuilderAppendChar avgt 5 39590.870 ? 14968.244 ns/op MergeStoreBench.str4StringBuilderUnsafePut avgt 5 1797.398 ? 3.972 ns/op MergeStoreBench.str4UnsafePut avgt 5 1547.226 ? 1.950 ns/op MergeStoreBench.str4Utf16ArrayCopy avgt 5 13984.076 ? 332.735 ns/op MergeStoreBench.str4Utf16ArraySetConst avgt 5 2592.408 ? 5.338 ns/op MergeStoreBench.str4Utf16StringBuilder avgt 5 18244.127 ? 2436.822 ns/op MergeStoreBench.str4Utf16StringBuilderAppendChar avgt 5 36861.665 ? 10735.884 ns/op MergeStoreBench.str4Utf16StringBuilderUnsafePut avgt 5 3103.648 ? 0.809 ns/op MergeStoreBench.str4Utf16UnsafePut avgt 5 2539.181 ? 11.556 ns/op MergeStoreBench.str5ArraySetConst avgt 5 3006.719 ? 4.606 ns/op MergeStoreBench.str5Arraycopy avgt 5 7152.151 ? 27.593 ns/op MergeStoreBench.str5GetBytes avgt 5 5572.568 ? 9.664 ns/op MergeStoreBench.str5GetChars avgt 5 14478.429 ? 597.483 ns/op MergeStoreBench.str5StringBuilder avgt 5 18249.007 ? 359.685 ns/op MergeStoreBench.str5StringBuilderAppendChar avgt 5 48156.310 ? 21354.806 ns/op MergeStoreBench.str5StringBuilderUnsafePut avgt 5 3039.131 ? 5.040 ns/op MergeStoreBench.str5UnsafePut avgt 5 2885.440 ? 4.323 ns/op MergeStoreBench.str5Utf16ArrayCopy avgt 5 4648.957 ? 115.805 ns/op MergeStoreBench.str5Utf16ArraySetConst avgt 5 3862.566 ? 3.036 ns/op MergeStoreBench.str5Utf16StringBuilder avgt 5 24592.386 ? 6936.461 ns/op MergeStoreBench.str5Utf16StringBuilderAppendChar avgt 5 44162.880 ? 36224.171 ns/op MergeStoreBench.str5Utf16StringBuilderUnsafePut avgt 5 3042.734 ? 9.256 ns/op MergeStoreBench.str5Utf16UnsafePut avgt 5 3858.479 ? 2.273 ns/op MergeStoreBench.str7ArraySetConst avgt 5 4656.166 ? 3.053 ns/op MergeStoreBench.str7Arraycopy avgt 5 12139.304 ? 10.065 ns/op MergeStoreBench.str7GetBytes avgt 5 11909.980 ? 14.371 ns/op MergeStoreBench.str7GetChars avgt 5 20885.722 ? 3159.820 ns/op MergeStoreBench.str7StringBuilder avgt 5 14813.587 ? 354.177 ns/op MergeStoreBench.str7StringBuilderAppendChar avgt 5 61647.309 ? 153.877 ns/op MergeStoreBench.str7StringBuilderUnsafePut avgt 5 4256.645 ? 1.095 ns/op MergeStoreBench.str7UnsafePut avgt 5 4662.482 ? 2.893 ns/op MergeStoreBench.str7Utf16ArrayCopy avgt 5 4939.354 ? 12.117 ns/op MergeStoreBench.str7Utf16ArraySetConst avgt 5 5401.214 ? 5.342 ns/op MergeStoreBench.str7Utf16StringBuilder avgt 5 25070.599 ? 8313.323 ns/op MergeStoreBench.str7Utf16StringBuilderAppendChar avgt 5 84853.104 ? 210.843 ns/op MergeStoreBench.str7Utf16StringBuilderUnsafePut avgt 5 5290.793 ? 21.012 ns/op MergeStoreBench.str7Utf16UnsafePut avgt 5 5502.576 ? 11.820 ns/op # 4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710) Benchmark Mode Cnt Score Error Units MergeStoreBench.str4ArraySetConst avgt 5 2229.455 ? 2.024 ns/op MergeStoreBench.str4Arraycopy avgt 5 8323.527 ? 60.470 ns/op MergeStoreBench.str4GetBytes avgt 5 7008.143 ? 6.658 ns/op MergeStoreBench.str4GetChars avgt 5 12343.528 ? 6.584 ns/op MergeStoreBench.str4StringBuilder avgt 5 21238.814 ? 1410.339 ns/op MergeStoreBench.str4StringBuilderAppendChar avgt 5 68667.406 ? 720.511 ns/op MergeStoreBench.str4StringBuilderUnsafePut avgt 5 2281.267 ? 1.324 ns/op MergeStoreBench.str4UnsafePut avgt 5 2230.367 ? 0.626 ns/op MergeStoreBench.str4Utf16ArrayCopy avgt 5 16338.896 ? 74.446 ns/op MergeStoreBench.str4Utf16ArraySetConst avgt 5 3098.749 ? 35.606 ns/op MergeStoreBench.str4Utf16StringBuilder avgt 5 21491.710 ? 2598.145 ns/op MergeStoreBench.str4Utf16StringBuilderAppendChar avgt 5 67748.629 ? 2224.953 ns/op MergeStoreBench.str4Utf16StringBuilderUnsafePut avgt 5 3840.268 ? 2.786 ns/op MergeStoreBench.str4Utf16UnsafePut avgt 5 2858.839 ? 46.434 ns/op MergeStoreBench.str5ArraySetConst avgt 5 3769.990 ? 2.877 ns/op MergeStoreBench.str5Arraycopy avgt 5 10604.229 ? 85.266 ns/op MergeStoreBench.str5GetBytes avgt 5 6604.073 ? 4.599 ns/op MergeStoreBench.str5GetChars avgt 5 15499.577 ? 166.819 ns/op MergeStoreBench.str5StringBuilder avgt 5 22817.332 ? 1330.696 ns/op MergeStoreBench.str5StringBuilderAppendChar avgt 5 86993.698 ? 419.806 ns/op MergeStoreBench.str5StringBuilderUnsafePut avgt 5 3803.737 ? 0.974 ns/op MergeStoreBench.str5UnsafePut avgt 5 3765.698 ? 1.774 ns/op MergeStoreBench.str5Utf16ArrayCopy avgt 5 5691.730 ? 4.200 ns/op MergeStoreBench.str5Utf16ArraySetConst avgt 5 4620.050 ? 73.237 ns/op MergeStoreBench.str5Utf16StringBuilder avgt 5 26974.200 ? 9799.822 ns/op MergeStoreBench.str5Utf16StringBuilderAppendChar avgt 5 84214.630 ? 1770.595 ns/op MergeStoreBench.str5Utf16StringBuilderUnsafePut avgt 5 3803.749 ? 2.164 ns/op MergeStoreBench.str5Utf16UnsafePut avgt 5 4463.146 ? 94.255 ns/op MergeStoreBench.str7ArraySetConst avgt 5 5905.221 ? 17.324 ns/op MergeStoreBench.str7Arraycopy avgt 5 14400.712 ? 68.866 ns/op MergeStoreBench.str7GetBytes avgt 5 11693.448 ? 11.413 ns/op MergeStoreBench.str7GetChars avgt 5 21262.620 ? 393.963 ns/op MergeStoreBench.str7StringBuilder avgt 5 21559.944 ? 97.469 ns/op MergeStoreBench.str7StringBuilderAppendChar avgt 5 120774.017 ? 927.175 ns/op MergeStoreBench.str7StringBuilderUnsafePut avgt 5 5520.405 ? 5.431 ns/op MergeStoreBench.str7UnsafePut avgt 5 5918.814 ? 8.237 ns/op MergeStoreBench.str7Utf16ArrayCopy avgt 5 6348.146 ? 2.766 ns/op MergeStoreBench.str7Utf16ArraySetConst avgt 5 4333.009 ? 1.980 ns/op MergeStoreBench.str7Utf16StringBuilder avgt 5 29406.714 ? 9703.134 ns/op MergeStoreBench.str7Utf16StringBuilderAppendChar avgt 5 117801.880 ? 811.216 ns/op MergeStoreBench.str7Utf16StringBuilderUnsafePut avgt 5 6684.164 ? 16.496 ns/op MergeStoreBench.str7Utf16UnsafePut avgt 5 6286.796 ? 316.658 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2763215404 From duke at openjdk.org Sun Mar 30 02:39:27 2025 From: duke at openjdk.org (Johannes Graham) Date: Sun, 30 Mar 2025 02:39:27 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v45] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: - Merge branch 'openjdk:master' into xor_const - move code into a header file that can be shared with GTEST - Undo accidental changes to Int tests - Add random range tests for Long - Merge branch 'openjdk:master' into xor_const - Merge branch 'openjdk:master' into xor_const - invert comparison in tests - update bug numbers and summary - add test of random ranges - consistency - ... and 51 more: https://git.openjdk.org/jdk/compare/3d2c3cd4...ce17608b ------------- Changes: https://git.openjdk.org/jdk/pull/23089/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=44 Stats: 624 lines in 6 files changed: 572 ins; 26 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From duke at openjdk.org Sun Mar 30 02:48:41 2025 From: duke at openjdk.org (Johannes Graham) Date: Sun, 30 Mar 2025 02:48:41 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v45] In-Reply-To: References: Message-ID: <4zL5QCptTs6BOiy9-2k9ntfFXxM5t5-GoMz3qP7-W-s=.b4e283aa-0ff7-451b-8c9d-e15649858c2c@github.com> On Sun, 30 Mar 2025 02:39:27 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: > > - Merge branch 'openjdk:master' into xor_const > - move code into a header file that can be shared with GTEST > - Undo accidental changes to Int tests > - Add random range tests for Long > - Merge branch 'openjdk:master' into xor_const > - Merge branch 'openjdk:master' into xor_const > - invert comparison in tests > - update bug numbers and summary > - add test of random ranges > - consistency > - ... and 51 more: https://git.openjdk.org/jdk/compare/3d2c3cd4...ce17608b The naming of that method evolved during the course of the review of this PR. I believe the thinking was that the check was not necessarily an overall upper bound, and a simpler name would imply it was more general. I have separated the logic out into its own header file, but I've put it under `opto`. `utilities` felt too far away from where it was used. I've called it `addnodeXorUtil.hpp` with the thinking that it could become the home for more code - it's a pretty small piece of code to have its own header. Other naming suggestions would be welcomed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2764352131 From duke at openjdk.org Sun Mar 30 03:14:32 2025 From: duke at openjdk.org (Johannes Graham) Date: Sun, 30 Mar 2025 03:14:32 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: add missing import ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/ce17608b..94a32dba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=44-45 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From duke at openjdk.org Sun Mar 30 05:08:45 2025 From: duke at openjdk.org (duke) Date: Sun, 30 Mar 2025 05:08:45 GMT Subject: Withdrawn: 8346964: C2: Improve integer multiplication with constant in MulINode::Ideal() In-Reply-To: <4UC1x1GPJCcIwPXKJZfiUGxQnuRaDQjOcN53wYmUzF4=.fafd71c1-2f48-4ae4-8e7e-8844c578429a@github.com> References: <4UC1x1GPJCcIwPXKJZfiUGxQnuRaDQjOcN53wYmUzF4=.fafd71c1-2f48-4ae4-8e7e-8844c578429a@github.com> Message-ID: On Mon, 6 Jan 2025 07:55:39 GMT, erifan wrote: > Constant multiplication x*C can be optimized as LEFT SHIFT, ADD or SUB instructions since generally these instructions have smaller latency and larger throughput on most architectures. For example: > 1. x*8 can be optimized as x<<3. > 2. x*9 can be optimized as x+x<<3, and x+x<<3 can be lowered as one SHIFT-ADD (ADD instruction combined with LEFT SHIFT) instruction on some architectures, like aarch64 and x86_64. > > Currently OpenJDK implemented a few such patterns in mid-end, including: > 1. |C| = 1<0) > 2. |C| = (1<0) > 3. |C| = (1<n, n>=0) > > The first two are ok. Because on most architectures they are lowered as only one ADD/SUB/SHIFT instruction. > > But the third pattern doesn't always perform well on some architectures, such as aarch64. The third pattern can be split as the following sub patterns: > 3.1. C = (1<0) > 3.2. C = -((1<0) > 3.3. C = (1<n, n>0) > 3.4. C = -((1<n, n>0) > > According to Arm optimization guide, if the shift amount > 4, the latency and throughput of ADD instruction is the same with MUL instruction. So in this case, converting MUL to ADD is not profitable. Take a[i] * C on aarch64 as an example. > > Before (MUL is not converted): > > mov x1, #C > mul x2, x1, x0 > > > Now (MUL is converted): > For 3.1: > > add x2, x0, x0, lsl #n > > > For 3.2: > > add x2, x0, x0, lsl #n // same cost with mul if n > 4 > neg x2, x2 > > > For 3.3: > > lsl x1, x0, #m > add x2, x1, x0, lsl #n // same cost with mul if n > 4 > > > For 3.4: > > lsl x1, x0, #m > add x2, x1, x0, lsl #n // same cost with mul if n > 4 > neg x2, x2 > > > Test results (ns/op) on Arm Neoverse V2: > > Before Now Uplift Pattern Notes > testInt9 103.379 60.702 1.70305756 3.1 > testIntN33 103.231 106.825 0.96635619 3.2 n > 4 > testIntN9 103.448 103.005 1.004300762 3.2 n <= 4 > testInt18 103.354 99.271 1.041129837 3.3 m <= 4, n <= 4 > testInt36 103.396 99.186 1.042445506 3.3 m > 4, n <= 4 > testInt96 103.337 105.416 0.980278136 3.3 m > 4, n > 4 > testIntN18 103.333 139.258 0.742025593 3.4 m <= 4, n <= 4 > testIntN36 103.208 139.132 0.741799155 3.4 m > 4, n <= 4 > testIntN96 103.367 139.471 0.74113615 3.4 m > 4, n > 4 > > > **(S1) From this point on, we should treat pattern 3 as follows:** > 3.1 C = (1<0) > 3.2 C = -((1< 3.3 C = (1<n, 0 3.4 C = -((1< > Since this conversion is implemented in mid-end, it impacts... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22922 From syan at openjdk.org Sun Mar 30 13:10:19 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 30 Mar 2025 13:10:19 GMT Subject: RFR: 8351233: [ASAN] avx2-emu-funcs.hpp:151:20: error: =?UTF-8?B?4oCYRC44MjE4OOKAmQ==?= is used uninitialized In-Reply-To: <7o6aKz9BB_Xy5fyAHMvURof9Zp1p5qIbzikO3d0arlc=.5966f24a-b244-4b88-936b-4d3d99953375@github.com> References: <7o6aKz9BB_Xy5fyAHMvURof9Zp1p5qIbzikO3d0arlc=.5966f24a-b244-4b88-936b-4d3d99953375@github.com> Message-ID: <0GRsjTidzD3lQ6p8fn9uq-Lo6f8C623w8_1V55OZU-E=.4556d434-5b83-4c97-abed-58df53bac45b@github.com> On Sat, 29 Mar 2025 01:32:19 GMT, Vladimir Ivanov wrote: >> Hi all, >> >> The return type of function `const __m256i &perm` is `__m256i`, so `const __m256i &perm` should be replaced as 'const __m256i perm'. >> >> The function implementation in gcc/clang compiler header: >> >> 1. gcc: lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h >> >> >> extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) >> _mm256_loadu_si256 (__m256i_u const *__P) >> { >> return *__P; >> } >> >> >> 2. clang: lib64/clang/17/include/avxintrin.h >> >> >> static __inline __m256i __DEFAULT_FN_ATTRS >> _mm256_loadu_si256(__m256i_u const *__p) >> { >> struct __loadu_si256 { >> __m256i_u __v; >> } __attribute__((__packed__, __may_alias__)); >> return ((const struct __loadu_si256*)__p)->__v; >> } >> >> >> Additional testing: >> >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64(AMD EPYC 9T24 96-Core Processor) with release build >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64(AMD EPYC 9T24 96-Core Processor) with fastdebug build > > Testing results (hs-tier1 - hs-tier4) are clean. GHA test failures are unrelated to this PR. Thanks for the reviews @iwanowww ------------- PR Comment: https://git.openjdk.org/jdk/pull/23925#issuecomment-2764554842 From syan at openjdk.org Sun Mar 30 13:10:19 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 30 Mar 2025 13:10:19 GMT Subject: Integrated: 8351233: [ASAN] avx2-emu-funcs.hpp:151:20: error: =?UTF-8?B?4oCYRC44MjE4OOKAmQ==?= is used uninitialized In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 03:35:20 GMT, SendaoYan wrote: > Hi all, > > The return type of function `const __m256i &perm` is `__m256i`, so `const __m256i &perm` should be replaced as 'const __m256i perm'. > > The function implementation in gcc/clang compiler header: > > 1. gcc: lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h > > > extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm256_loadu_si256 (__m256i_u const *__P) > { > return *__P; > } > > > 2. clang: lib64/clang/17/include/avxintrin.h > > > static __inline __m256i __DEFAULT_FN_ATTRS > _mm256_loadu_si256(__m256i_u const *__p) > { > struct __loadu_si256 { > __m256i_u __v; > } __attribute__((__packed__, __may_alias__)); > return ((const struct __loadu_si256*)__p)->__v; > } > > > Additional testing: > > - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64(AMD EPYC 9T24 96-Core Processor) with release build > - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64(AMD EPYC 9T24 96-Core Processor) with fastdebug build This pull request has now been integrated. Changeset: 895aabc4 Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/895aabc4632a0b5e245aeceb6c2dcdb4b07f640e Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8351233: [ASAN] avx2-emu-funcs.hpp:151:20: error: ?D.82188? is used uninitialized Reviewed-by: vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/23925 From rraj at openjdk.org Sun Mar 30 13:25:15 2025 From: rraj at openjdk.org (Rohit Arul Raj) Date: Sun, 30 Mar 2025 13:25:15 GMT Subject: Integrated: 8317976: Optimize SIMD sort for AMD Zen 4 In-Reply-To: References: Message-ID: <13Dw2ScGkpzCoj96YwlXZuOoiHYVtP1OktkGrkQHOaQ=.80e41b6b-b602-425d-a6aa-0221ca9998f2@github.com> On Fri, 14 Mar 2025 10:48:09 GMT, Rohit Arul Raj wrote: > In JDK-8309130, Array sort was optimized using AVX512 SIMD instructions for x86_64. Currently, this optimization has been disabled for AMD Zen 4 [JDK-8317763] due to bad performance of compressstoreu. > Ref: https://www.reddit.com/r/java/comments/171t5sj/heads_up_openjdk_implementation_of_avx512_based/. > > This patch enables Zen 4 to pick optimized AVX2 version of SIMD sort and Zen 5 picks the AVX512 version. > > JTREG Tests: Completed Tier1 & Tier2 tests on Zen4 & Zen5 - No Regressions. > > Attaching ArraySort performance data for Zen4 & Zen5. > [Zen4-ArraySort-Data.txt](https://github.com/user-attachments/files/19245831/Zen4-ArraySort-Data.txt) > [Zen5-ArraySort-Data.txt](https://github.com/user-attachments/files/19245833/Zen5-ArraySort-Data.txt) This pull request has now been integrated. Changeset: 8cbadf78 Author: Rohit Arul Raj Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/8cbadf78d04d0e3d1136a5582f281de099fc5e49 Stats: 17 lines in 3 files changed: 13 ins; 0 del; 4 mod 8317976: Optimize SIMD sort for AMD Zen 4 Reviewed-by: psandoz, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/24053 From duke at openjdk.org Sun Mar 30 17:16:20 2025 From: duke at openjdk.org (duke) Date: Sun, 30 Mar 2025 17:16:20 GMT Subject: Withdrawn: 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 13:43:10 GMT, SendaoYan wrote: > Hi all, > The `Gtest codestrings.validate_vm` intermittent fails with different disassembly symbol name, such as different symbol name with instruction `adrp`/`b` etc. I think the difference of symbol name is acceptable, this PR remove the releated symbol name to make the fragile disassemble identical compare more robustness. > The change has been verified locally, the gtest test run with 20k times all passed, except sometimes the subtest `ThreadsListHandle::sanity_vm` intermittent fails which has been recorded by [JDK-8315141](https://bugs.openjdk.org/browse/JDK-8315141). Test-fix only, no risk. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21955 From cslucas at openjdk.org Mon Mar 31 03:43:03 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 31 Mar 2025 03:43:03 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: > Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. > Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. > Tested on OSX/Linux Aarch64/x86_64 with JTREG. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix WhiteBox constants. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24298/files - new: https://git.openjdk.org/jdk/pull/24298/files/dc6e5cdb..b121160f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24298&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24298&range=00-01 Stats: 7 lines in 2 files changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24298.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24298/head:pull/24298 PR: https://git.openjdk.org/jdk/pull/24298 From rehn at openjdk.org Mon Mar 31 06:38:16 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 31 Mar 2025 06:38:16 GMT Subject: RFR: 8353219: RISC-V: Fix client builds after JDK-8345298 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 02:01:17 GMT, Fei Yang wrote: > Hi, please review this trivial change fixing a client build issue. > The definitions of both `generate_float16ToFloat()` and `generate_floatToFloat16()` should be moved out of `COMPILER2_OR_JVMCI` macro scope. Testing: client builds fine on linux-riscv64 with this change. Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24307#pullrequestreview-2728341092 From mchevalier at openjdk.org Mon Mar 31 06:49:50 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 31 Mar 2025 06:49:50 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v2] In-Reply-To: References: Message-ID: > If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. > > In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. > > This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. > > The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. > > Tested with tier1..3, hs-precheckin-comp and hs-comp-stress > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: not reinventing the wheel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24245/files - new: https://git.openjdk.org/jdk/pull/24245/files/a77c397c..daaaf9ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24245&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24245&range=00-01 Stats: 13 lines in 1 file changed: 0 ins; 12 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24245/head:pull/24245 PR: https://git.openjdk.org/jdk/pull/24245 From mchevalier at openjdk.org Mon Mar 31 06:49:50 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 31 Mar 2025 06:49:50 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v2] In-Reply-To: References: <-Ri4lJUzCkI9yLG-kGwTGeAhd453SDgt_qvoB1iw4_A=.f3e126ab-a4ff-4f7f-80a7-c6e739cc6727@github.com> Message-ID: <48D8vzTXZDKtZxAMTDdo9ggjWnWn7XNjs6rZqwuDZxc=.d833c90c-09da-4167-aec9-aba8b9e523b5@github.com> On Fri, 28 Mar 2025 19:24:22 GMT, Tobias Hartmann wrote: >> Almost! >> >> return !TypeAryPtr::_array_interfaces->contains(this); >> >> Contains is about TypeInterfaces, that is set of interfaces. So I just need to check that `this` is not a sub-set of array interfaces. That should do it. > > Now I'm confused, isn't this what I proposed? I didn't check the exact syntax, I just wondered if the `TypeInterfaces::contains` method couldn't be used instead of adding a new method. Yes, totally! It's just a detail difference. But there is another question: whether we still want `has_non_array_interface` has a wrapper for this call with a more explicit name, or if we simply inline your suggestion on the callsite of `has_non_array_interface`. I tend toward the first, I like explicit names, and I suspect it might be useful in more than one place, but not a strong opinion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24245#discussion_r2020483393 From galder at openjdk.org Mon Mar 31 07:25:11 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 31 Mar 2025 07:25:11 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v2] In-Reply-To: <_ISX3hSaQuWOvkh8KUsOS69y_aDB6JZGSWsVT1DWq4k=.de29649c-00f1-4e84-9a46-75ef89e8e30a@github.com> References: <_ISX3hSaQuWOvkh8KUsOS69y_aDB6JZGSWsVT1DWq4k=.de29649c-00f1-4e84-9a46-75ef89e8e30a@github.com> Message-ID: On Thu, 27 Mar 2025 05:19:43 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - fix tests >> - whitespace >> - whitespace >> - fix whitespace >> - JDK-8344942 > > Looks great though I'm not too familiar with the code to be able to do a reasonable review, but I had a question: > > Have you got any practical use case that can show where you've used this and show what it takes to build such a use case? `VectorReduction2` or similar type of microbenchmarks would be great to see auto generated using this? > > The reason I ask this is because I feel that something that is missing in this PR is a small practical use case where this framework is put into action to actually generate some jtreg/IR/microbenchmark test and see how it runs as part of the CI in the PR. WDYT? > @galderz Thanks for your questions! > > > Looks great though I'm not too familiar with the code to be able to do a reasonable review > > Well the code is all brand new, so really anybody could review ;) Right, what I meant is that developers that have past history with this work will be able to provide a more thorough review :) > > Have you got any practical use case that can show where you've used this and show what it takes to build such a use case? > > I actually have a list of experiments in this branch (it is linked in the PR description): #23418 Some of them use the IR framework, though for now just as a testing harness, not for IR rules. Generating IR rules automatically requires quite a bit of logic... I hope that is satisfactory for now? Yes it is. Sorry I missed the linked PR when I read the description. The examples there look great, it's what I was looking for. > Ah, but there was this test: `test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java` I did now not refactor it, but it would not be too hard to see how to use the Templates for it. And I do generate IR rules in that one. I don't super like just refactoring old tests... there is always a risk of breaking it and then coverage is worse than before... > > > The reason I ask this is because I feel that something that is missing in this PR is a small practical use case where this framework is put into action to actually generate some jtreg/IR/microbenchmark test and see how it runs as part of the CI in the PR. WDYT? > > I also have a few tests in this PR that just generate regular JTREG tests, without the IR framework, did you see those? Yeah I've seen them now. > > > VectorReduction2 or similar type of microbenchmarks would be great to see auto generated using this? > > I don't yet have a solution for microbenchmarks. It's mostly an issue of including the `test/hotspot/jtreg/compiler/lib` path... And I fear that JMH requires all benchmark code to be compiled beforehand, and not dynamically as I do with the class loader. But maybe there is a solution for that. > > The patch is already quite large, and so I wanted to just publish the basic framework. Do you think that is ok? Yeah sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2765342389 From chagedorn at openjdk.org Mon Mar 31 07:41:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 31 Mar 2025 07:41:12 GMT Subject: RFR: 8350577: Fix missing Assertion Predicates when splitting loops [v3] In-Reply-To: <2pn7Amnyitbm2OHxM5COLNJAc3E27C1ShNWxi44Ul-Q=.a50d9a76-2277-48bc-a12f-3d97742aaa0b@github.com> References: <2pn7Amnyitbm2OHxM5COLNJAc3E27C1ShNWxi44Ul-Q=.a50d9a76-2277-48bc-a12f-3d97742aaa0b@github.com> Message-ID: On Fri, 28 Mar 2025 10:25:44 GMT, Christian Hagedorn wrote: >> _Note: The actual fix is only ~80 changed lines - everything else is about tests._ >> >> After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. >> >> For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). >> >> ### Maintain Assertion Predicates when Splitting a Loop >> When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). >> >> #### Current State >> Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: >> >> | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | >> | ------------------------ | --------------------------------------- | --------------------------------------- | >> | Create Main Loop | ? | ? | >> | Create Post Loop | ? | ? | >> | Loop Unswitching | ? | _not required, same init, stride and, limit_ | >> | Loop Unrolling | ? | ? | >> | Range Check Elimination | ? | ? | >> | Loop Peeling | ? | ? | >> | Splitting Main Loop | ? | ? | >> >> Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. >> >> #### Fixing Unsupported Cases >> This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: >> - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. >> - Remove the rest... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Remove UseLoopPredicate Testing looked good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24246#issuecomment-2765374124 From chagedorn at openjdk.org Mon Mar 31 07:41:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 31 Mar 2025 07:41:12 GMT Subject: Integrated: 8350577: Fix missing Assertion Predicates when splitting loops In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:21:50 GMT, Christian Hagedorn wrote: > _Note: The actual fix is only ~80 changed lines - everything else is about tests._ > > After integrating many preparatory sub-tasks, I'm finally fixing the last outstanding Assertion Predicate issues with this patch. > > For more background about Assertion Predicates, have a look at the following [blog post](https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html). > > ### Maintain Assertion Predicates when Splitting a Loop > When performing Loop Predication on a counted loop, we create two Template Assertion Predicates for each hoisted range check. Whenever we split this loop as part of loop opts, we need to establish the Template Assertion Predicates at the new sub loops with the new loop init and stride and create Initialized Assertion Predicates from them. This ensures that a sub loop is folded when it becomes dead due to an impossible condition (e.g. always having a negative index for the hoisted range check in all loop iterations). > > #### Current State > Today, we are already covering most of the cases where Assertion Predicates are required - but we are still missing some. The following table describes the current state for the different loop splitting optimizations: > > | Loop Optimization | Template Assertion Predicate | Initialized Assertion Predicate | > | ------------------------ | --------------------------------------- | --------------------------------------- | > | Create Main Loop | ? | ? | > | Create Post Loop | ? | ? | > | Loop Unswitching | ? | _not required, same init, stride and, limit_ | > | Loop Unrolling | ? | ? | > | Range Check Elimination | ? | ? | > | Loop Peeling | ? | ? | > | Splitting Main Loop | ? | ? | > > Whenever we apply a loop optimization that does not establish Template Assertion Predicates, then all subsequent loop splitting optimizations on that loop cannot establish Template Assertion Predicates, either, and we fail to emit Initialized Assertion Predicates which can lead to a broken graph. > > #### Fixing Unsupported Cases > This patch provides fixes for the remaining unsupported cases as shown in the table above. With all the work done in previous PRs, the fix is quite straight forward: > - Remove the restriction that we don't clone Template Assertion Predicate in Loop Peeling and post loop creation. > - Remove the restriction that we only clone Template Assertion Predicate ... This pull request has now been integrated. Changeset: 25925138 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/25925138b0a7d781d9293e52a8c9520329a85219 Stats: 1821 lines in 5 files changed: 1711 ins; 39 del; 71 mod 8350577: Fix missing Assertion Predicates when splitting loops Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24246 From chagedorn at openjdk.org Mon Mar 31 07:44:42 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 31 Mar 2025 07:44:42 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 15:32:30 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8349479 > - review > - whitespace > - fix & test Thanks for adding the `KillPathsReachableByDeadTypeNode` switch! Now that https://github.com/openjdk/jdk/pull/24246 is in, can you add `KillPathsReachableByDeadTypeNode` to the `TestAssertionPredicates.java` runs? To still run them with product, you can probably just use it together with `IgnoreUnrecognizedVMOptions`. src/hotspot/share/opto/c2_globals.hpp line 824: > 822: develop(bool, KillPathsReachableByDeadTypeNode, true, \ > 823: "When a Type node becomes top, make paths where the node is used" \ > 824: "dead by replacing them with a Halt node") \ Maybe we can also add a warning of possible failures. Something like: Suggestion: develop(bool, KillPathsReachableByDeadTypeNode, true, \ "When a Type node becomes top, make paths where the node is " \ "used dead by replacing them with a Halt node. Turning this off " \ "could corrupt the graph in rare cases and should be used with " \ "care.") \ src/hotspot/share/opto/castnode.cpp line 99: > 97: // Return a node which is more "ideal" than the current node. Strip out > 98: // control copies > 99: Node *ConstraintCastNode::Ideal(PhaseGVN *phase, bool can_reshape) { While at it: Suggestion: Node* ConstraintCastNode::Ideal(PhaseGVN* phase, bool can_reshape) { src/hotspot/share/opto/castnode.cpp line 103: > 101: return this; > 102: } > 103: if (in(1) != nullptr && phase->type(in(1)) != Type::TOP) { Can `in(1)` ever be null? src/hotspot/share/opto/convertnode.cpp line 732: > 730: > 731: //------------------------------Ideal------------------------------------------ > 732: Node *ConvI2LNode::Ideal(PhaseGVN *phase, bool can_reshape) { While at it: Suggestion: Node* ConvI2LNode::Ideal(PhaseGVN* phase, bool can_reshape) { src/hotspot/share/opto/convertnode.cpp line 733: > 731: //------------------------------Ideal------------------------------------------ > 732: Node *ConvI2LNode::Ideal(PhaseGVN *phase, bool can_reshape) { > 733: if (in(1) != nullptr && phase->type(in(1)) != Type::TOP) { Same here and in `ConvL2I`, can `in(1)` ever be null? src/hotspot/share/opto/convertnode.cpp line 841: > 839: // Return a node which is more "ideal" than the current node. > 840: // Blow off prior masking to int > 841: Node *ConvL2INode::Ideal(PhaseGVN *phase, bool can_reshape) { Suggestion: Node* ConvL2INode::Ideal(PhaseGVN* phase, bool can_reshape) { ------------- PR Review: https://git.openjdk.org/jdk/pull/23468#pullrequestreview-2728428324 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2020526901 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2020527365 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2020528307 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2020530780 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2020531057 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2020531623 From mchevalier at openjdk.org Mon Mar 31 07:54:14 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 31 Mar 2025 07:54:14 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v2] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 26 Mar 2025 08:33:58 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Use builtin_throw > - Merge branch 'master' into fix/Deoptimization-and-re-compilation-cycle-with-C2-compiled-code > - More exhaustive bench > - Limit inlining of math Exact operations in case of too many deopts Actually, yes, there is a reason I've made it so weird (and I agree it's pretty convoluted). `builtin_throw` kicks in if `too_many_traps(reason)` is true (and another case, but it might not apply): https://github.com/openjdk/jdk/blob/59629f88e6fad9c1ff91be4cfea83f78f0ea503c/src/hotspot/share/opto/graphKit.cpp#L540-L555 If `treat_throw_as_hot` is false (so before too many traps) it just ends up as a `uncommon_trap` with `Action_maybe_recompile` action. That is fine at first. But later, we would like `builtin_throw` to do its job, but it can only do if if https://github.com/openjdk/jdk/blob/59629f88e6fad9c1ff91be4cfea83f78f0ea503c/src/hotspot/share/opto/graphKit.cpp#L563 which is not `too_many_traps(reason)`. Which means that: - if we don't bailout intrinsics on `too_many_traps(reason)` we may be in the same situation as in the bug, with deopt cycles, in the situation where `builtin_throw` doesn't do it's job (for instance `method()->can_omit_stack_trace()` is false) - if we bailout intrincs on `too_many_traps(reason)`, then `builtin_throw` never get a hot enough throw that it can speed up, and we have the same situation as my first version, before you suggested `builtin_throw` (with performances similar for C2 and C1). In other words, we need `too_many_traps(reason)` to be reached to have `builtin_throw` start to have a change to do something, but it might not, and in this case, we need to bailout from intrinsics otherwise, we will repeatedly deopt. So, when `too_many_traps(reason)` is true, we have two options: either we give it to `builtin_throw` or we bailout. And to avoid the deopt cycles, we must know in advance if `builtin_throw` will do its job or just default to an `uncommon_trap` again (in which case, bailing out is better). This is why I extracted the condition for `builtin_throw` into `builtin_throw_applies`: so that intrinsic can decide what is best to do. Some of your suggestions are still relevant tho! I'll apply them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2765414288 From mchevalier at openjdk.org Mon Mar 31 08:05:50 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 31 Mar 2025 08:05:50 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v3] In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: guess_exception_from_deopt_reason out of builtin_throw ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23916/files - new: https://git.openjdk.org/jdk/pull/23916/files/9372228d..41d7a1d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=01-02 Stats: 49 lines in 2 files changed: 21 ins; 25 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23916/head:pull/23916 PR: https://git.openjdk.org/jdk/pull/23916 From shade at openjdk.org Mon Mar 31 08:18:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 31 Mar 2025 08:18:28 GMT Subject: RFR: 8351156: C1: Remove FPU stack support after 32-bit x86 removal [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 19:41:48 GMT, Aleksey Shipilev wrote: >> C1 has the 32-bit x86 specific code that supports x87 FPU stack allocations. With 32-bit x86 port removed, we can clean up those parts. 64-bit x86 does not need x87 FPU, since it is baselined on SSE2 and using XMM registers instead. >> >> There are lots of deeper cleanups possible, this PR focuses on removing the x87 FPU allocation/uses in C1. Note current C1 nomenclature is confusing. On all arches, "FPU" means floating-point _registers_. That is _except_ on x86, where "FPU" means x87 FPU stack, and "XMM" means floating-point registers. This is why we only touch "FPU" code on other architectures only in a light manner. After all 32-bit x86 cleanups land, we might consider renaming "XMM" -> "FPU" in x86 C1 to match the common nomenclature. >> >> Brief tour of changes: >> - FPU stack simulator is not needed anymore, so I removed it and the related infrastructure >> - Lots of 32-bit specific paths that touch x87 FPU registers are pruned >> - Related LIR nodes like `lir_fxch`, `lir_fld`, `lir_fpop_raw` are pruned >> - Simplified the API that is no longer needed, e.g. dropping `pop_fpu_stack` >> >> This PR would likely conflict with some in-flight cleanups, so it would require merges later. Take a look meanwhile. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8351156-x86-c1-fpustack > - Fixing build failures after reg2stack > - Remove remaining FPU uses in LIRAssembler_x86 > - Touchups > - Initial fix Thank you! Here we go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24274#issuecomment-2765463649 From shade at openjdk.org Mon Mar 31 08:18:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 31 Mar 2025 08:18:28 GMT Subject: Integrated: 8351156: C1: Remove FPU stack support after 32-bit x86 removal In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 10:18:24 GMT, Aleksey Shipilev wrote: > C1 has the 32-bit x86 specific code that supports x87 FPU stack allocations. With 32-bit x86 port removed, we can clean up those parts. 64-bit x86 does not need x87 FPU, since it is baselined on SSE2 and using XMM registers instead. > > There are lots of deeper cleanups possible, this PR focuses on removing the x87 FPU allocation/uses in C1. Note current C1 nomenclature is confusing. On all arches, "FPU" means floating-point _registers_. That is _except_ on x86, where "FPU" means x87 FPU stack, and "XMM" means floating-point registers. This is why we only touch "FPU" code on other architectures only in a light manner. After all 32-bit x86 cleanups land, we might consider renaming "XMM" -> "FPU" in x86 C1 to match the common nomenclature. > > Brief tour of changes: > - FPU stack simulator is not needed anymore, so I removed it and the related infrastructure > - Lots of 32-bit specific paths that touch x87 FPU registers are pruned > - Related LIR nodes like `lir_fxch`, `lir_fld`, `lir_fpop_raw` are pruned > - Simplified the API that is no longer needed, e.g. dropping `pop_fpu_stack` > > This PR would likely conflict with some in-flight cleanups, so it would require merges later. Take a look meanwhile. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` This pull request has now been integrated. Changeset: 23e3b3ff Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/23e3b3ff6ab17a71b16fdf2e61548a7413ddb6d4 Stats: 2555 lines in 39 files changed: 0 ins; 2508 del; 47 mod 8351156: C1: Remove FPU stack support after 32-bit x86 removal Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24274 From duke at openjdk.org Mon Mar 31 08:22:15 2025 From: duke at openjdk.org (Anjian-Wen) Date: Mon, 31 Mar 2025 08:22:15 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v2] In-Reply-To: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: > support Zvbb Vector And-Not draft Anjian-Wen has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: RISC-V: C2: Support Zvbb Vector And-Not instruction add Vector And-Not match rule and tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24129/files - new: https://git.openjdk.org/jdk/pull/24129/files/45f61ff8..7fc67099 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=00-01 Stats: 74 lines in 3 files changed: 1 ins; 65 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24129/head:pull/24129 PR: https://git.openjdk.org/jdk/pull/24129 From mchevalier at openjdk.org Mon Mar 31 08:33:42 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 31 Mar 2025 08:33:42 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v4] In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into fix/Deoptimization-and-re-compilation-cycle-with-C2-compiled-code - guess_exception_from_deopt_reason out of builtin_throw - Use builtin_throw - Merge branch 'master' into fix/Deoptimization-and-re-compilation-cycle-with-C2-compiled-code - More exhaustive bench - Limit inlining of math Exact operations in case of too many deopts ------------- Changes: https://git.openjdk.org/jdk/pull/23916/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=03 Stats: 759 lines in 6 files changed: 723 ins; 27 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23916/head:pull/23916 PR: https://git.openjdk.org/jdk/pull/23916 From roland at openjdk.org Mon Mar 31 08:49:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 08:49:11 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v9] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 16:35:24 GMT, Emanuel Peter wrote: >>> Ah, right, I see that you already mentioned that above. Should we then problem list the test with this change? Testing looks clean otherwise. >> >> https://github.com/openjdk/jdk/pull/23465 is a fix for JDK-8341976 and given it's much simpler than this change, I suppose it will get in first. > > @rwestrel Is this ready for review? @eme64 yes, it's ready for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2765556614 From chagedorn at openjdk.org Mon Mar 31 08:53:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 31 Mar 2025 08:53:14 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 08:03:12 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want to generate the `javadocs`: >> `javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework` >> >> **History** >> @TobiHartmann and I have played with code generators for a while, and have had ... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 > - fix tests > - whitespace > - whitespace > - fix whitespace > - JDK-8344942 Impressive work Emanuel! This is a very valuable framework to improve our testing coverage. Here are some first comments after skimming some of the code and comments. I will deep dive more into the code and documentation later. test/hotspot/jtreg/compiler/lib/template_framework/README.md line 2: > 1: # Template Framework > 2: The Template Framework allows the generation of code with Templates. The goal is that these Templates are easy to write, and allow regression tests to cover a larger scope, and to make temlate based fuzzing easy to extend. Suggestion: The Template Framework allows the generation of code with Templates. The goal is that these Templates are easy to write, and allow regression tests to cover a larger scope, and to make template based fuzzing easy to extend. test/hotspot/jtreg/compiler/lib/template_framework/README.md line 6: > 4: The Template Framework only generates code in the form of a String. This code can then be compiled and executed, for example with help of the [Compile Framework](../compile_framework/README.md). > 5: > 6: The basic functionalities of the Template Framework are described in the [Template Class](./Template.java), together with some examples. More examples can be found in [TestSimple.java](../../../testlibrary_tests/template_framework/examples/TestSimple.java) and [TestTutorial.java](../../../testlibrary_tests/template_framework/examples/TestTutorial.java). Suggestion: The basic functionalities of the Template Framework are described in the [Template Interface](./Template.java), together with some examples. More examples can be found in [TestSimple.java](../../../testlibrary_tests/template_framework/examples/TestSimple.java) and [TestTutorial.java](../../../testlibrary_tests/template_framework/examples/TestTutorial.java). test/hotspot/jtreg/compiler/lib/template_framework/README.md line 8: > 6: The basic functionalities of the Template Framework are described in the [Template Class](./Template.java), together with some examples. More examples can be found in [TestSimple.java](../../../testlibrary_tests/template_framework/examples/TestSimple.java) and [TestTutorial.java](../../../testlibrary_tests/template_framework/examples/TestTutorial.java). > 7: > 8: The [Template Library](../template_library/README.md) provides a large number of Templates which can be used to create anything from simple regression tests to complex fuzzers. `template_library/README.md` does not exist. Another thought: Should `template_library` be a subfolder in `template_framework`? Otherwise, it could suggest to be something separate from the Template Framework. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 32: > 30: > 31: /** > 32: * {@link Template}s are used to generate code, based on {@link Token} which are rendered to {@link String}. Tokens and Strings? Suggestion: * {@link Template}s are used to generate code, based on {@link Token}s which are rendered to {@link String}s. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 35: > 33: * > 34: *

> 35: * A {@link Template} can have zero or more arguments, and for each number of arguments there is an implementation Since the `README` refers to this file for more information (which is perfectly fine to avoid repetition), I naturally started to read here at the top. But in this paragraph, we are already explaining details of the implementation without giving a more general introduction and motivation for the Template Framework which can leave readers without background confused. I think it could be worth to spend some more time in the README (which you can then also just refer to when suggesting to use the framework in PRs). You could cover: Why is the framework useful/why should I care about it, some leading example/testing scenario and why it is really hard to cover that without the framework (i.e. what we've done so far until today), how does the framework roughly work, how easy is it to write tests (you can then reference example tests from there) etc. You can still do many references from the `README` to classes and examples. Side note: I admit that I could have extended the IR framework `README` with a more motivational introduction as well - maybe I will add that later. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 41: > 39: * the {@link Template}s provide hashtag replacements in the Strings: the {@link Template} argument > 40: * names are captured, and the argument values automatically replace any {@code "#name"} in the Strings. See the > 41: * different overloads of {@link make} for examples. Additional hashtag replacements can be defined with {@link let}. General Javadocs comment: You should preceed method names with `#` in order to create a proper link. Without: ![image](https://github.com/user-attachments/assets/b53280a4-8d5a-40a6-84a6-07fc78aa0c2d) With (i.e. `{@link make}` and `{@link #let})`: ![image](https://github.com/user-attachments/assets/1722614d-9535-410f-b2de-494ecd45a42a) ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2728506638 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2020575952 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2020584958 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2020579026 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2020593912 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2020620543 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2020591523 From epeter at openjdk.org Mon Mar 31 08:57:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 08:57:33 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects - Verify.Options refactor for Galder - Update test/hotspot/jtreg/compiler/lib/verify/Verify.java Co-authored-by: Galder Zamarre?o - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects - clean up test - JDK-8352869 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/f2ed085a..d46c45de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=03-04 Stats: 6635 lines in 96 files changed: 3713 ins; 2593 del; 329 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Mon Mar 31 08:58:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 08:58:53 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v3] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework` > > **History** > @TobiHartmann and I have played with code generators for a while, and have had the dream of doing that in a more principled way. And to hopefully... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 - fix tests - whitespace - whitespace - fix whitespace - JDK-8344942 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/ededf45b..85bcc6eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=01-02 Stats: 11507 lines in 249 files changed: 5411 ins; 5426 del; 670 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From roland at openjdk.org Mon Mar 31 09:17:33 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 09:17:33 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v6] In-Reply-To: References: Message-ID: <2PMc8aj1hZrI84JBUfE0jchLTzic8Z-8APUu1cYVlPc=.72a1f026-7262-4069-8878-e30bbd5363e1@github.com> > The `arraycopy` writes to a non escaping array so its `ArrayCopy` node > is marked as having a narrow memory effect. One of the loads from the > destination after the copy is transformed into a load from the source > array (the rationale being that if there's no load from the > destination of the copy, the `arraycopy` is not needed). The load from > the source has the input memory state of the `ArrayCopy` as memory > input. That load is then sunk out of the loop and its control is > updated to be after the `ArrayCopy`. That's legal because the > `ArrayCopy` only has a narrow memory effect and can't modify the > source. The `ArrayCopy` can't be eliminated and is expanded. In the > process, a `MemBar` that has a wide memory effect is added. The load > from the source has control after the membar but memory state before > and because the membar has a wide memory effect, the load is anti > dependent on the membar: the graph is broken (the load can't be pinned > after the membar and anti dependent on it). > > In short, the problem is that the graph is transformed under the > assumption that the `ArrayCopy` has a narrow effect but the > `ArrayCopy` is expanded to a subgraph that has a wide memory > effect. The fix I propose is to not insert a membar with a wide memory > effect. We still need a membar when the destination is non escaping > because the expanded `ArrayCopy`, if it writes to a tighly allocated > array, writes to raw memory and not to the destination memory slice. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - review - review - Merge branch 'master' into JDK-8341976 - -XX:+TraceLoopOpts fix - review - more - Merge branch 'master' into JDK-8341976 - more - exp - fix - ... and 4 more: https://git.openjdk.org/jdk/compare/54d3daff...9f79e0b0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23465/files - new: https://git.openjdk.org/jdk/pull/23465/files/6d48b9f2..9f79e0b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=04-05 Stats: 70127 lines in 1982 files changed: 21068 ins; 40124 del; 8935 mod Patch: https://git.openjdk.org/jdk/pull/23465.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23465/head:pull/23465 PR: https://git.openjdk.org/jdk/pull/23465 From roland at openjdk.org Mon Mar 31 09:20:37 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 09:20:37 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v6] In-Reply-To: <5oGeRDTLETGizI0hd14DBW5z3qoi7-IaI_3ESDhBH2c=.2dd179fe-5f0d-4e93-a452-aa50ebd29c68@github.com> References: <4hKl3zRJ6EP4QA-iuKiEpdwIqFk2-YvrpixAGy_VidU=.e9490e22-7751-41a6-a3e7-202930be570a@github.com> <5oGeRDTLETGizI0hd14DBW5z3qoi7-IaI_3ESDhBH2c=.2dd179fe-5f0d-4e93-a452-aa50ebd29c68@github.com> Message-ID: On Mon, 24 Mar 2025 15:41:06 GMT, Damon Fenacci wrote: >> Makes sense, thanks for the explanation! > > Do we still need `is_partial_array_copy` in production builds? It seems to be used only in an assertion block. Good catch. Actually, I don't think we need it at all. I removed it entirely in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2020676919 From galder at openjdk.org Mon Mar 31 09:32:09 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 31 Mar 2025 09:32:09 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 08:57:33 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects > - Verify.Options refactor for Galder > - Update test/hotspot/jtreg/compiler/lib/verify/Verify.java > > Co-authored-by: Galder Zamarre?o > - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects > - clean up test > - JDK-8352869 Changes requested by galder (Author). test/hotspot/jtreg/compiler/lib/verify/Verify.java line 297: > 295: private void checkEQimpl(float a, float b, String field, Object aParent, Object bParent) { > 296: if (isFloatEQ(a, b)) { > 297: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + verifyOptions.isFloatCheckWithRawBits); I meant using JEP 378 text blocks, e.g. Suggestion: System.err.printf(""" ERROR: Verify.checkEQ failed: value mismatch. check raw: %b Values: %.1f vs %.1f Raw: %d vs %d """, isFloatCheckWithRawBits, a, b, Float.floatToRawIntBits(a), Float.floatToRawIntBits(b)); ------------- PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2728707044 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2020697026 From roland at openjdk.org Mon Mar 31 09:33:28 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 09:33:28 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v4] In-Reply-To: References: Message-ID: > This is primarily motivated by 8275202 (C2: optimize out more > redundant conditions). In the following code snippet: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > int v = array[i]; > > > (`arraySize` is a constant) > > at the range check, `j` is known to be in `[min, arraySize]` as a > consequence, `i` is known to be `[0, arraySize-1]`. The range check > can be eliminated. > > Now, if later, `i` constant folds to some value that's positive but > out of range for the array: > > - if that happens when the new pass runs, then it can prove that: > > if (i < j) { > > is never taken. > > - if that happens during IGVN or CCP however, that condition is not > constant folded. And because the range check was removed, there's no > guard protecting the range check `CastII`. It becomes `top` and, as > a result, the graph can become broken. > > What I propose here is that when the `CastII` becomes dead, any CFG > paths that use the `CastII` node is made unreachable. So in pseudo code: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > halt(); > > > Finding the CFG paths is implemented in the patch by following the > uses of the node until a CFG node or a `Phi` is encountered. > > The patch applies this to all `Type` nodes as with 8275202, I also ran > in some rare corner cases with other types of nodes. The exception is > `Phi` nodes which may not be as easy to handle (and for which I had no > issue with 8275202). > > Finally, the patch includes a test case that's unrelated to the > discussion of 8275202 above. In that test case, a `CastII` becomes top > but the test that guards it doesn't constant fold. The root cause is a > transformation of: > > > (CastII (AddI > > > into > > > (AddI (CastII ) (CastII)` > > > which causes the resulting node to have a wider type. The `CastII` > captures a type before the transformation above happens. Once it has > happened, the guard for the `CastII` can't be constant folded when an > out of bound value occurs. > > This is likely fixable some other way (eventhough it doesn't seem > straightforward). Given the long history of similar issues (and the > test case that shows that they are more hiding), I think it would > make sense to try some other way of approaching them. Roland Westrelin has updated the pull request incrementally with four additional commits since the last revision: - Update src/hotspot/share/opto/convertnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/convertnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/c2_globals.hpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23468/files - new: https://git.openjdk.org/jdk/pull/23468/files/f310865f..a412e8c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=02-03 Stats: 7 lines in 3 files changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23468/head:pull/23468 PR: https://git.openjdk.org/jdk/pull/23468 From mchevalier at openjdk.org Mon Mar 31 09:37:08 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 31 Mar 2025 09:37:08 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v2] In-Reply-To: <48D8vzTXZDKtZxAMTDdo9ggjWnWn7XNjs6rZqwuDZxc=.d833c90c-09da-4167-aec9-aba8b9e523b5@github.com> References: <-Ri4lJUzCkI9yLG-kGwTGeAhd453SDgt_qvoB1iw4_A=.f3e126ab-a4ff-4f7f-80a7-c6e739cc6727@github.com> <48D8vzTXZDKtZxAMTDdo9ggjWnWn7XNjs6rZqwuDZxc=.d833c90c-09da-4167-aec9-aba8b9e523b5@github.com> Message-ID: On Mon, 31 Mar 2025 06:46:51 GMT, Marc Chevalier wrote: >> Now I'm confused, isn't this what I proposed? I didn't check the exact syntax, I just wondered if the `TypeInterfaces::contains` method couldn't be used instead of adding a new method. > > Yes, totally! It's just a detail difference. But there is another question: whether we still want `has_non_array_interface` has a wrapper for this call with a more explicit name, or if we simply inline your suggestion on the callsite of `has_non_array_interface`. I tend toward the first, I like explicit names, and I suspect it might be useful in more than one place, but not a strong opinion. For now, I just replaced the implementation of `has_non_array_interface`. If one feels even keeping the method is premature factorization, I can easily inline it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24245#discussion_r2020704570 From bkilambi at openjdk.org Mon Mar 31 09:55:20 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 31 Mar 2025 09:55:20 GMT Subject: RFR: 8348868: AArch64: Add backend support for SelectFromTwoVector In-Reply-To: References: Message-ID: <03mNhjjP_PvR9nxPUCaIkN5NF--gH7-AMqiHJlAzJW0=.e0e1cd1e-f236-4a6d-b9da-1459eed6077d@github.com> On Tue, 11 Feb 2025 20:20:54 GMT, Bhavana Kilambi wrote: > This patch adds aarch64 backend support for SelectFromTwoVector operation which was recently introduced in VectorAPI. > > It implements this operation using a two table vector lookup instruction - "tbl" which is available only in Neon and SVE2. > > For 128-bit vector length : Neon tbl instruction is generated if UseSVE < 2 and SVE2 "tbl" instruction is generated if UseSVE == 2. > > For > 128-bit vector length : Currently there are no machines which have vector length > 128-bit and support SVE2. For all those machines with vector length > 128-bit and UseSVE < 2, this operation is not supported. The inline expander for this operation would fail and lowered IR will be generated which is a mix of two rearrange and one blend operation. > > This patch also adds a boolean "need_load_shuffle" in the inline expander for this operation to test if the platform requires VectorLoadShuffle operation to be generated. Without this, the lowering IR was not being generated on aarch64 and the performance was quite poor. > > Performance numbers with this patch on a 128-bit, SVE2 supporting machine is shown below - > > > Benchmark (size) Mode Cnt Gain > SelectFromBenchmark.selectFromByteVector 1024 thrpt 9 1.43 > SelectFromBenchmark.selectFromByteVector 2048 thrpt 9 1.48 > SelectFromBenchmark.selectFromDoubleVector 1024 thrpt 9 68.55 > SelectFromBenchmark.selectFromDoubleVector 2048 thrpt 9 72.07 > SelectFromBenchmark.selectFromFloatVector 1024 thrpt 9 1.69 > SelectFromBenchmark.selectFromFloatVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromIntVector 1024 thrpt 9 1.50 > SelectFromBenchmark.selectFromIntVector 2048 thrpt 9 1.52 > SelectFromBenchmark.selectFromLongVector 1024 thrpt 9 85.38 > SelectFromBenchmark.selectFromLongVector 2048 thrpt 9 80.93 > SelectFromBenchmark.selectFromShortVector 1024 thrpt 9 1.48 > SelectFromBenchmark.selectFromShortVector 2048 thrpt 9 1.49 > > > Gain column refers to the ratio of thrpt between this patch and the master branch after applying changes in the inline expander. Hello, I would not be able to respond to comments until the next couple months or so due to some urgent tasks at work. Until then, I'd move this PR to draft status so that it would not be closed due to lack of activity. Thank you for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23570#issuecomment-2765731068 From shade at openjdk.org Mon Mar 31 10:05:17 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 31 Mar 2025 10:05:17 GMT Subject: RFR: 8353176: C1: x86 patching stub always calls Thread::current() In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 11:05:33 GMT, Aleksey Shipilev wrote: > Noticed this while looking at compiled code density. In C1 PatchingStub code, we _always_ perform a runtime call to `Thread::current()`, even though we can rely on `r15_thread` to be available. This currently emits a huge sequence of instructions for register save/restore and the call itself. > > Current code calls to `MacroAssembler::get_thread()`, which is always doing that slowpath. This kind of accident would be less likely / impossible once we cleanup uses of `MacroAssembler::get_thread()` with [JDK-8353174](https://bugs.openjdk.org/browse/JDK-8353174). > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` Thanks! I am in the middle of x86 C1 cleanups, but I believe this does not conflict with anything in flight. So I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24291#issuecomment-2765748786 From shade at openjdk.org Mon Mar 31 10:05:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 31 Mar 2025 10:05:18 GMT Subject: Integrated: 8353176: C1: x86 patching stub always calls Thread::current() In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 11:05:33 GMT, Aleksey Shipilev wrote: > Noticed this while looking at compiled code density. In C1 PatchingStub code, we _always_ perform a runtime call to `Thread::current()`, even though we can rely on `r15_thread` to be available. This currently emits a huge sequence of instructions for register save/restore and the call itself. > > Current code calls to `MacroAssembler::get_thread()`, which is always doing that slowpath. This kind of accident would be less likely / impossible once we cleanup uses of `MacroAssembler::get_thread()` with [JDK-8353174](https://bugs.openjdk.org/browse/JDK-8353174). > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1` + `-XX:TieredStopAtLevel=1` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` This pull request has now been integrated. Changeset: 6fbaa066 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/6fbaa066ce45b70f1c288d1245b03fe18ceba126 Stats: 10 lines in 1 file changed: 0 ins; 6 del; 4 mod 8353176: C1: x86 patching stub always calls Thread::current() Reviewed-by: mdoerr, kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/24291 From roland at openjdk.org Mon Mar 31 10:12:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 10:12:06 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v5] In-Reply-To: References: Message-ID: > This is primarily motivated by 8275202 (C2: optimize out more > redundant conditions). In the following code snippet: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > int v = array[i]; > > > (`arraySize` is a constant) > > at the range check, `j` is known to be in `[min, arraySize]` as a > consequence, `i` is known to be `[0, arraySize-1]`. The range check > can be eliminated. > > Now, if later, `i` constant folds to some value that's positive but > out of range for the array: > > - if that happens when the new pass runs, then it can prove that: > > if (i < j) { > > is never taken. > > - if that happens during IGVN or CCP however, that condition is not > constant folded. And because the range check was removed, there's no > guard protecting the range check `CastII`. It becomes `top` and, as > a result, the graph can become broken. > > What I propose here is that when the `CastII` becomes dead, any CFG > paths that use the `CastII` node is made unreachable. So in pseudo code: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > halt(); > > > Finding the CFG paths is implemented in the patch by following the > uses of the node until a CFG node or a `Phi` is encountered. > > The patch applies this to all `Type` nodes as with 8275202, I also ran > in some rare corner cases with other types of nodes. The exception is > `Phi` nodes which may not be as easy to handle (and for which I had no > issue with 8275202). > > Finally, the patch includes a test case that's unrelated to the > discussion of 8275202 above. In that test case, a `CastII` becomes top > but the test that guards it doesn't constant fold. The root cause is a > transformation of: > > > (CastII (AddI > > > into > > > (AddI (CastII ) (CastII)` > > > which causes the resulting node to have a wider type. The `CastII` > captures a type before the transformation above happens. Once it has > happened, the guard for the `CastII` can't be constant folded when an > out of bound value occurs. > > This is likely fixable some other way (eventhough it doesn't seem > straightforward). Given the long history of similar issues (and the > test case that shows that they are more hiding), I think it would > make sense to try some other way of approaching them. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - review - Merge branch 'master' into JDK-8349479 - Update src/hotspot/share/opto/convertnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/convertnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/c2_globals.hpp Co-authored-by: Christian Hagedorn - review - Merge branch 'master' into JDK-8349479 - review - whitespace - ... and 1 more: https://git.openjdk.org/jdk/compare/b1431a02...7b033117 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23468/files - new: https://git.openjdk.org/jdk/pull/23468/files/a412e8c1..7b033117 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=03-04 Stats: 8750 lines in 158 files changed: 4832 ins; 3470 del; 448 mod Patch: https://git.openjdk.org/jdk/pull/23468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23468/head:pull/23468 PR: https://git.openjdk.org/jdk/pull/23468 From roland at openjdk.org Mon Mar 31 10:12:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 10:12:10 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 18:41:58 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8349479 >> - review >> - whitespace >> - fix & test > > src/hotspot/share/opto/node.cpp line 3134: > >> 3132: size_t len = ss.size() + 1; >> 3133: char* arena_str = NEW_ARENA_ARRAY(igvn->C->comp_arena(), char, len); >> 3134: memcpy(arena_str, ss.base(), len); > > Does it make sense to move it into `stringStream::as_string()`? `stringStream::as_string()` already handles resource area and C-heap allocations. It does make sense. Implemented in new commit. I added a new method and there's some code duplication but it felt better than adding one more optional argument to the existing method. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2020758391 From roland at openjdk.org Mon Mar 31 10:21:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 10:21:20 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 07:30:47 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8349479 >> - review >> - whitespace >> - fix & test > > src/hotspot/share/opto/castnode.cpp line 103: > >> 101: return this; >> 102: } >> 103: if (in(1) != nullptr && phase->type(in(1)) != Type::TOP) { > > Can `in(1)` ever be null? There's a good chance that it can never be null. I think it's been considered good practice over the year to be particularly defensive about this (there must be other Ideal transformations where inputs can be cleared as the graph is transformed) and I tend to add checks for null inputs systematically. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2020770414 From roland at openjdk.org Mon Mar 31 10:31:23 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 10:31:23 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v6] In-Reply-To: References: Message-ID: <6y1XDY3d35rHf9NYDqKVc3om6QyQzhMIlJZsuzpeEiI=.05fb13cd-5023-4c48-827c-fe2f0f5a64eb@github.com> > This is primarily motivated by 8275202 (C2: optimize out more > redundant conditions). In the following code snippet: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > int v = array[i]; > > > (`arraySize` is a constant) > > at the range check, `j` is known to be in `[min, arraySize]` as a > consequence, `i` is known to be `[0, arraySize-1]`. The range check > can be eliminated. > > Now, if later, `i` constant folds to some value that's positive but > out of range for the array: > > - if that happens when the new pass runs, then it can prove that: > > if (i < j) { > > is never taken. > > - if that happens during IGVN or CCP however, that condition is not > constant folded. And because the range check was removed, there's no > guard protecting the range check `CastII`. It becomes `top` and, as > a result, the graph can become broken. > > What I propose here is that when the `CastII` becomes dead, any CFG > paths that use the `CastII` node is made unreachable. So in pseudo code: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > halt(); > > > Finding the CFG paths is implemented in the patch by following the > uses of the node until a CFG node or a `Phi` is encountered. > > The patch applies this to all `Type` nodes as with 8275202, I also ran > in some rare corner cases with other types of nodes. The exception is > `Phi` nodes which may not be as easy to handle (and for which I had no > issue with 8275202). > > Finally, the patch includes a test case that's unrelated to the > discussion of 8275202 above. In that test case, a `CastII` becomes top > but the test that guards it doesn't constant fold. The root cause is a > transformation of: > > > (CastII (AddI > > > into > > > (AddI (CastII ) (CastII)` > > > which causes the resulting node to have a wider type. The `CastII` > captures a type before the transformation above happens. Once it has > happened, the guard for the `CastII` can't be constant folded when an > out of bound value occurs. > > This is likely fixable some other way (eventhough it doesn't seem > straightforward). Given the long history of similar issues (and the > test case that shows that they are more hiding), I think it would > make sense to try some other way of approaching them. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23468/files - new: https://git.openjdk.org/jdk/pull/23468/files/7b033117..d06a94e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=04-05 Stats: 15 lines in 1 file changed: 15 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23468/head:pull/23468 PR: https://git.openjdk.org/jdk/pull/23468 From roland at openjdk.org Mon Mar 31 10:38:21 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 10:38:21 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: > This is primarily motivated by 8275202 (C2: optimize out more > redundant conditions). In the following code snippet: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > int v = array[i]; > > > (`arraySize` is a constant) > > at the range check, `j` is known to be in `[min, arraySize]` as a > consequence, `i` is known to be `[0, arraySize-1]`. The range check > can be eliminated. > > Now, if later, `i` constant folds to some value that's positive but > out of range for the array: > > - if that happens when the new pass runs, then it can prove that: > > if (i < j) { > > is never taken. > > - if that happens during IGVN or CCP however, that condition is not > constant folded. And because the range check was removed, there's no > guard protecting the range check `CastII`. It becomes `top` and, as > a result, the graph can become broken. > > What I propose here is that when the `CastII` becomes dead, any CFG > paths that use the `CastII` node is made unreachable. So in pseudo code: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > halt(); > > > Finding the CFG paths is implemented in the patch by following the > uses of the node until a CFG node or a `Phi` is encountered. > > The patch applies this to all `Type` nodes as with 8275202, I also ran > in some rare corner cases with other types of nodes. The exception is > `Phi` nodes which may not be as easy to handle (and for which I had no > issue with 8275202). > > Finally, the patch includes a test case that's unrelated to the > discussion of 8275202 above. In that test case, a `CastII` becomes top > but the test that guards it doesn't constant fold. The root cause is a > transformation of: > > > (CastII (AddI > > > into > > > (AddI (CastII ) (CastII)` > > > which causes the resulting node to have a wider type. The `CastII` > captures a type before the transformation above happens. Once it has > happened, the guard for the `CastII` can't be constant folded when an > out of bound value occurs. > > This is likely fixable some other way (eventhough it doesn't seem > straightforward). Given the long history of similar issues (and the > test case that shows that they are more hiding), I think it would > make sense to try some other way of approaching them. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23468/files - new: https://git.openjdk.org/jdk/pull/23468/files/d06a94e5..1ec2177a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=05-06 Stats: 15 lines in 1 file changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/23468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23468/head:pull/23468 PR: https://git.openjdk.org/jdk/pull/23468 From roland at openjdk.org Mon Mar 31 10:38:21 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 10:38:21 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v6] In-Reply-To: <6y1XDY3d35rHf9NYDqKVc3om6QyQzhMIlJZsuzpeEiI=.05fb13cd-5023-4c48-827c-fe2f0f5a64eb@github.com> References: <6y1XDY3d35rHf9NYDqKVc3om6QyQzhMIlJZsuzpeEiI=.05fb13cd-5023-4c48-827c-fe2f0f5a64eb@github.com> Message-ID: On Mon, 31 Mar 2025 10:31:23 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review > Thanks for adding the `KillPathsReachableByDeadTypeNode` switch! Now that #24246 is in, can you add `KillPathsReachableByDeadTypeNode` to the `TestAssertionPredicates.java` runs? To still run them with product, you can probably just use it together with `IgnoreUnrecognizedVMOptions`. Done in new commits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2765825573 From chagedorn at openjdk.org Mon Mar 31 10:49:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 31 Mar 2025 10:49:15 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 16:01:47 GMT, Roland Westrelin wrote: >>> Added. The TraceLoopOpts crash reproduces: the code hits a malformed counted loop. I tweaked the printing code. >> >> Is the malformed counted loop expected or a different issue to look into? > > It doesn't look an actual issue to me. `PhiNode::Value` manages to narrow the trip `phi`'s type of the pre loop enough that it's a constant. So the loop no longer has the expected counted loop shape but the loop exit condition that should constant fold doesn't because it's guarded by an `Opaque1` node. This sounds very similar to [JDK-8297752](https://bugs.openjdk.org/browse/JDK-8297752) which also caused problems outside of `TraceLoopOpts`. We should probably handle this separately from this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2020814123 From epeter at openjdk.org Mon Mar 31 10:57:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 10:57:17 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 08:09:35 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - fix tests >> - whitespace >> - whitespace >> - fix whitespace >> - JDK-8344942 > > test/hotspot/jtreg/compiler/lib/template_framework/README.md line 8: > >> 6: The basic functionalities of the Template Framework are described in the [Template Class](./Template.java), together with some examples. More examples can be found in [TestSimple.java](../../../testlibrary_tests/template_framework/examples/TestSimple.java) and [TestTutorial.java](../../../testlibrary_tests/template_framework/examples/TestTutorial.java). >> 7: >> 8: The [Template Library](../template_library/README.md) provides a large number of Templates which can be used to create anything from simple regression tests to complex fuzzers. > > `template_library/README.md` does not exist. > > Another thought: Should `template_library` be a subfolder in `template_framework`? Otherwise, it could suggest to be something separate from the Template Framework. Removing that line, and moving the library to a subfolder. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2020823170 From epeter at openjdk.org Mon Mar 31 11:08:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 11:08:02 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v4] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework` > > **History** > @TobiHartmann and I have played with code generators for a while, and have had the dream of doing that in a more principled way. And to hopefully... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - manual merge - move library ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/85bcc6eb..b7e0f2b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=02-03 Stats: 109 lines in 4 files changed: 46 ins; 48 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Mon Mar 31 11:08:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 11:08:05 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 08:18:59 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - fix tests >> - whitespace >> - whitespace >> - fix whitespace >> - JDK-8344942 > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 41: > >> 39: * the {@link Template}s provide hashtag replacements in the Strings: the {@link Template} argument >> 40: * names are captured, and the argument values automatically replace any {@code "#name"} in the Strings. See the >> 41: * different overloads of {@link make} for examples. Additional hashtag replacements can be defined with {@link let}. > > General Javadocs comment: You should preceed method names with `#` in order to create a proper link. > > Without: > ![image](https://github.com/user-attachments/assets/b53280a4-8d5a-40a6-84a6-07fc78aa0c2d) > > With (i.e. `{@link make}` and `{@link #let})`: > ![image](https://github.com/user-attachments/assets/1722614d-9535-410f-b2de-494ecd45a42a) interesting. As we saw offline, `javadoc` does not need the hashtag, but some editors do, and it seems to be common practice to have the hashtag. I'll add them for every applicable link. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2020843963 From epeter at openjdk.org Mon Mar 31 11:23:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 11:23:43 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v5] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework` > > **History** > @TobiHartmann and I have played with code generators for a while, and have had the dream of doing that in a more principled way. And to hopefully... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix hashtag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/b7e0f2b8..fa69d6dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=03-04 Stats: 33 lines in 5 files changed: 1 ins; 0 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Mon Mar 31 11:23:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 11:23:43 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 11:04:07 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 41: >> >>> 39: * the {@link Template}s provide hashtag replacements in the Strings: the {@link Template} argument >>> 40: * names are captured, and the argument values automatically replace any {@code "#name"} in the Strings. See the >>> 41: * different overloads of {@link make} for examples. Additional hashtag replacements can be defined with {@link let}. >> >> General Javadocs comment: You should preceed method names with `#` in order to create a proper link. >> >> Without: >> ![image](https://github.com/user-attachments/assets/b53280a4-8d5a-40a6-84a6-07fc78aa0c2d) >> >> With (i.e. `{@link make}` and `{@link #let})`: >> ![image](https://github.com/user-attachments/assets/1722614d-9535-410f-b2de-494ecd45a42a) > > interesting. As we saw offline, `javadoc` does not need the hashtag, but some editors do, and it seems to be common practice to have the hashtag. I'll add them for every applicable link. I think I fixed all, but I cannot easily confirm. Can you check if there are any left? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2020862929 From epeter at openjdk.org Mon Mar 31 11:32:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 11:32:36 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References: Message-ID: <6xxAHK1tTsbsjH8TPbzwd8ubj82QYcAJqzjZUBU0wK4=.ac67feae-9e42-4a11-9ecf-387efd1e57c3@github.com> On Mon, 31 Mar 2025 09:28:27 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - Verify.Options refactor for Galder >> - Update test/hotspot/jtreg/compiler/lib/verify/Verify.java >> >> Co-authored-by: Galder Zamarre?o >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - clean up test >> - JDK-8352869 > > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 297: > >> 295: private void checkEQimpl(float a, float b, String field, Object aParent, Object bParent) { >> 296: if (isFloatEQ(a, b)) { >> 297: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + verifyOptions.isFloatCheckWithRawBits); > > I meant using JEP 378 text blocks, e.g. > > Suggestion: > > System.err.printf(""" > ERROR: Verify.checkEQ failed: value mismatch. check raw: %b > Values: %.1f vs %.1f > Raw: %d vs %d > """, isFloatCheckWithRawBits, a, b, Float.floatToRawIntBits(a), Float.floatToRawIntBits(b)); I see. That has advantages and disadvantages. Advantage: You can more easily see the "skeleton" of the test. Disadvantage: Mapping the "holes" and the "values" is annoying, you basically have to count through each position. Plus it may end up being more lines. Best would really be String Templates. I asked @chhagedorn , he does not have an opinion either way. Personally, I prefer my way, where you can easily see what values go where directly. But this is probably a taste question. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2020873454 From roland at openjdk.org Mon Mar 31 11:49:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 11:49:09 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 06:49:50 GMT, Marc Chevalier wrote: >> If `TypeInstKlassPtr` represents an array type, it has to be `java.lang.Object`. From contraposition, if it is not `java.lang.Object`, we can conclude it is not an array, and we can skip some array checks, for instance. >> >> In this PR, we improve this deduction with an interface base reasoning: arrays implements only Cloneable and Serializable, so if a type implements anything else, it cannot be an array. >> >> This change partially reverts the changes from [JDK-8348631](https://bugs.openjdk.org/browse/JDK-8348631) (#23331) (in `LibraryCallKit::generate_array_guard_common`) and the test still passes. >> >> The way interfaces are check might be done differently. The current situation is a balance between visibility (not to leak too much things explicitly private), having not overly general methods for one use-case and avoiding too concrete (and brittle) interfaces. >> >> Tested with tier1..3, hs-precheckin-comp and hs-comp-stress >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > not reinventing the wheel src/hotspot/share/opto/memnode.cpp line 2214: > 2212: if (tkls->offset() == in_bytes(Klass::layout_helper_offset()) && > 2213: tkls->isa_instklassptr() && // not directly typed as an array > 2214: !tkls->is_instklassptr()->might_be_an_array() // not the supertype of all T[] (java.lang.Object) or has an interface that is not Serializable or Cloneable Could we do the same by using `TypeKlassPtr::maybe_java_subtype_of(TypeAryKlassPtr::BOTTOM)` and define a `TypeAryKlassPtr::BOTTOM` to be a static field for the `array_interfaces`? AFAICT, `TypeKlassPtr::maybe_java_subtype_of()` already covers that case so it would avoid some logic duplication. Also in the test above, maybe you could simplify the test a little but by removing `tkls->isa_instklassptr()`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24245#discussion_r2020893305 From roland at openjdk.org Mon Mar 31 11:55:24 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 31 Mar 2025 11:55:24 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 10:46:50 GMT, Christian Hagedorn wrote: >> It doesn't look an actual issue to me. `PhiNode::Value` manages to narrow the trip `phi`'s type of the pre loop enough that it's a constant. So the loop no longer has the expected counted loop shape but the loop exit condition that should constant fold doesn't because it's guarded by an `Opaque1` node. > > This sounds very similar to [JDK-8297752](https://bugs.openjdk.org/browse/JDK-8297752) which also caused problems outside of `TraceLoopOpts`. We should probably handle this separately from this PR. Right. So maybe, we could treat that `Opaque` node the way we do for `OpaqueZeroTripGuard` and have it constant fold when the backedge is never taken. So I should revert the change to the `IdealLoopTree::dump_head()` and the test run with `TraceLoopOpts`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2020899830 From chagedorn at openjdk.org Mon Mar 31 12:03:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 31 Mar 2025 12:03:36 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v6] In-Reply-To: References: <6y1XDY3d35rHf9NYDqKVc3om6QyQzhMIlJZsuzpeEiI=.05fb13cd-5023-4c48-827c-fe2f0f5a64eb@github.com> Message-ID: On Mon, 31 Mar 2025 10:33:57 GMT, Roland Westrelin wrote: > > Thanks for adding the `KillPathsReachableByDeadTypeNode` switch! Now that #24246 is in, can you add `KillPathsReachableByDeadTypeNode` to the `TestAssertionPredicates.java` runs? To still run them with product, you can probably just use it together with `IgnoreUnrecognizedVMOptions`. > > Done in new commits. Great, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23468#issuecomment-2766011470 From chagedorn at openjdk.org Mon Mar 31 12:03:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 31 Mar 2025 12:03:37 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 10:17:51 GMT, Roland Westrelin wrote: > I think it's been considered good practice over the year to be particularly defensive about this Makes sense from a stability point of view. I'm wondering though if it's not a bug when the cast input is null at this point. Aren't there only few CFG nodes, like regions, where we set some inputs to null already? There is other code, for example in `ConvI2L::Ideal()`, that later accesses `in(1)` without null check: https://github.com/openjdk/jdk/blob/1ec2177a6b25573732b902f76bb81dd1cdaf7edf/src/hotspot/share/opto/convertnode.cpp#L728 To be consistent, we would also need to add a check for the other accesses in the method or turn the null check into a bailout for the entire `Ideal()` method. If we agree that null is unexpected (or assume it should be), we might also want to add asserts accordingly. My concern is that most IGVN methods assume non-control inputs cannot be null where we normally expect a sane input. This is probably true but hard to prove. To be overally consistent, we should also consider adding bailout and assertion code there. While it's the safest solution, this could introduce a lot of new code, especially for multi input nodes, which also makes it harder to read. What are your thought about that? Anyway, we don't need to make a decision as part of this PR on how we should generally handle inputs in IGVN method. It's fine if we only concentrate on the touched/new code here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2020909000 From chagedorn at openjdk.org Mon Mar 31 12:29:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 31 Mar 2025 12:29:11 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: References: Message-ID: <9cGlvzZnXc8B5tNxXSE2Eqi2FDJzP26U7c-yan4ZdCc=.3f6b0821-8b6c-453d-87ee-91205cc6627a@github.com> On Mon, 31 Mar 2025 11:52:37 GMT, Roland Westrelin wrote: > So maybe, we could treat that Opaque node the way we do for OpaqueZeroTripGuard and have it constant fold when the backedge is never taken. Right, that sounds like a good solution. > So I should revert the change to the IdealLoopTree::dump_head() and the test run with TraceLoopOpts? Yes, that would be great. We can make a comment in [JDK-8297752](https://bugs.openjdk.org/browse/JDK-8297752) to add `-XX:+TraceLoopOpts` as additional run to this test when we fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2020942323 From chagedorn at openjdk.org Mon Mar 31 12:33:33 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 31 Mar 2025 12:33:33 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead Message-ID: As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. Thanks, Christian ------------- Commit messages: - 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead Changes: https://git.openjdk.org/jdk/pull/24326/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24326&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352418 Stats: 42 lines in 2 files changed: 42 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24326.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24326/head:pull/24326 PR: https://git.openjdk.org/jdk/pull/24326 From epeter at openjdk.org Mon Mar 31 13:46:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 13:46:34 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v6] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: For Christian: example and more intro ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/fa69d6dd..77079807 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=04-05 Stats: 274 lines in 3 files changed: 272 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From rcastanedalo at openjdk.org Mon Mar 31 13:47:28 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 31 Mar 2025 13:47:28 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> Message-ID: <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> On Mon, 24 Mar 2025 15:33:34 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Extend example with offset register mask I have reviewed the main bulk of this changeset (all HotSpot changes except those in `chaitin.*` and `ifg.cpp`) and have not found any functional issues. I will have a look at the remaining changes over the next days. Good job, Daniel! src/hotspot/share/opto/compile.cpp line 677: > 675: _java_calls(0), > 676: _inner_loops(0), > 677: _FIRST_STACK_mask(&_comp_arena), Suggestion: _FIRST_STACK_mask(comp_arena()), src/hotspot/share/opto/compile.cpp line 678: > 676: _inner_loops(0), > 677: _FIRST_STACK_mask(&_comp_arena), > 678: _regmask_arena(mtCompiler), Consider tagging this arena with a new `Arena::Tag`, for better compilation memory stat accuracy. Here is my suggested change: https://github.com/robcasloz/jdk/commit/efd3821877dea2f763cbb73364b19eb81a20a110. src/hotspot/share/opto/compile.cpp line 944: > 942: _java_calls(0), > 943: _inner_loops(0), > 944: _FIRST_STACK_mask(&_comp_arena), Suggestion: _FIRST_STACK_mask(comp_arena()), src/hotspot/share/opto/matcher.cpp line 148: > 146: C->record_method_not_compilable("unsupported incoming calling sequence"); > 147: return OptoReg::Bad; > 148: } Please consider removing the failure polls after calling `warp_incoming_stk_arg`, I believe the removal of this bailout makes them unnecessary. src/hotspot/share/opto/matcher.cpp line 195: > 193: if (C->failing()) { > 194: return; > 195: } Is this failure poll required after your changes? src/hotspot/share/opto/matcher.cpp line 196: > 194: return; > 195: } > 196: _return_addr_mask.Insert(return_addr()); Could you assert that `_return_addr_mask` is empty before the insertion, to make it easier to see that we are preserving the old behavior? src/hotspot/share/opto/matcher.cpp line 982: > 980: STACK_ONLY_mask.Set_All_From(OptoReg::stack2reg(0)); > 981: > 982: OptoReg::Name i; Consider moving the declaration of `i` into the `for` statement below. src/hotspot/share/opto/optoreg.hpp line 237: > 235: } > 236: OptoRegPair(OptoReg::Name f) : OptoRegPair(OptoReg::Bad, f) {} > 237: OptoRegPair() : OptoRegPair(OptoReg::Bad, OptoReg::Bad) {} This is preexisting, but since the changeset touches the code: these two "partial" constructors seem unused, please consider removing them (but double-check in that case that they are unused for all platforms). src/hotspot/share/opto/postaloc.cpp line 686: > 684: assert(!(!value[ureg_lo] && lrgs(useidx).mask().is_offset() && > 685: !lrgs(useidx).mask().Member(ureg_lo)), > 686: "invalid assumption"); Could you use more descriptive names and assertion messages in this new assertion and the one below? Ideally, without having to refer to old versions. What is the invariant that we want to check? How does it relate to the surrounding code? src/hotspot/share/opto/regmask.hpp line 545: > 543: > 544: // Overlap test. Non-zero if any registers in common, including all-stack. > 545: bool overlap(const RegMask &rm) const { Please review the frequency of the different tests in this function. I ran an instrumented version and found the test in Case 4 to succeed (return true) more often that Case 2 and Case 3. src/hotspot/share/utilities/globalDefinitions.hpp line 1363: > 1361: // synchronized statements in Java. > 1362: const int BoxLockNode_slot_limit = 200; > 1363: This definition seems too C2-specific to be put in this shared file, could it be moved e.g. to `optoreg.hpp`? ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2729213050 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021007017 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021020456 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021007752 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021067375 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021058074 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021050321 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021054876 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021034249 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021042384 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021029514 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021024332 From epeter at openjdk.org Mon Mar 31 13:46:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 13:46:34 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v6] In-Reply-To: References: <_ISX3hSaQuWOvkh8KUsOS69y_aDB6JZGSWsVT1DWq4k=.de29649c-00f1-4e84-9a46-75ef89e8e30a@github.com> Message-ID: <63RLeeT7odhNj_IuH1ex3coX4SIqeWt2xNXt47KRcJA=.ef8d3969-6934-40f2-b781-544acaf4cbb7@github.com> On Mon, 31 Mar 2025 07:21:15 GMT, Galder Zamarre?o wrote: >> Looks great though I'm not too familiar with the code to be able to do a reasonable review, but I had a question: >> >> Have you got any practical use case that can show where you've used this and show what it takes to build such a use case? `VectorReduction2` or similar type of microbenchmarks would be great to see auto generated using this? >> >> The reason I ask this is because I feel that something that is missing in this PR is a small practical use case where this framework is put into action to actually generate some jtreg/IR/microbenchmark test and see how it runs as part of the CI in the PR. WDYT? > >> @galderz Thanks for your questions! >> >> > Looks great though I'm not too familiar with the code to be able to do a reasonable review >> >> Well the code is all brand new, so really anybody could review ;) > > Right, what I meant is that developers that have past history with this work will be able to provide a more thorough review :) > >> > Have you got any practical use case that can show where you've used this and show what it takes to build such a use case? >> >> I actually have a list of experiments in this branch (it is linked in the PR description): #23418 Some of them use the IR framework, though for now just as a testing harness, not for IR rules. Generating IR rules automatically requires quite a bit of logic... I hope that is satisfactory for now? > > Yes it is. Sorry I missed the linked PR when I read the description. The examples there look great, it's what I was looking for. > >> Ah, but there was this test: `test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java` I did now not refactor it, but it would not be too hard to see how to use the Templates for it. And I do generate IR rules in that one. I don't super like just refactoring old tests... there is always a risk of breaking it and then coverage is worse than before... >> >> > The reason I ask this is because I feel that something that is missing in this PR is a small practical use case where this framework is put into action to actually generate some jtreg/IR/microbenchmark test and see how it runs as part of the CI in the PR. WDYT? >> >> I also have a few tests in this PR that just generate regular JTREG tests, without the IR framework, did you see those? > > Yeah I've seen them now. > >> >> > VectorReduction2 or similar type of microbenchmarks would be great to see auto generated using this? >> >> I don't yet have a solution for microbenchmarks. It's mostly an issue of including the `test/hotspot/jtreg/compiler/lib` path... And I fear that JMH requires all benchmark code to be compiled beforehand, and not dynamically as I do with the class loader. But maybe there is a solution for that. >> >> The patch is already quite large, and so I wanted to just publish the basic framework. Do you think that is ok? > > Yeah sure. @galderz @chhagedorn I added an additional test now: https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 @chhagedorn I think I addressed all your suggestions :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2766280378 From epeter at openjdk.org Mon Mar 31 13:46:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 13:46:37 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 08:39:36 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - fix tests >> - whitespace >> - whitespace >> - fix whitespace >> - JDK-8344942 > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 35: > >> 33: * >> 34: *

>> 35: * A {@link Template} can have zero or more arguments, and for each number of arguments there is an implementation > > Since the `README` refers to this file for more information (which is perfectly fine to avoid repetition), I naturally started to read here at the top. But in this paragraph, we are already explaining details of the implementation without giving a more general introduction and motivation for the Template Framework which can leave readers without background confused. I think it could be worth to spend some more time in the README (which you can then also just refer to when suggesting to use the framework in PRs). You could cover: Why is the framework useful/why should I care about it, some leading example/testing scenario and why it is really hard to cover that without the framework (i.e. what we've done so far until today), how does the framework roughly work, how easy is it to write tests (you can then reference example tests from there) etc. > > You can still do many references from the `README` to classes and examples. > > Side note: I admit that I could have extended the IR framework `README` with a more motivational introduction as well - maybe I will add that later. I added an additional test, and copied some snippets to the java docs. I also added some more motivation. Let me know if that is better, or if you have any more suggestions :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2021072263 From mchevalier at openjdk.org Mon Mar 31 13:48:28 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 31 Mar 2025 13:48:28 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 Message-ID: As the ticket says: > Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. Thanks, Marc ------------- Commit messages: - Turn TestContinuationPinningAndEA into IR test Changes: https://git.openjdk.org/jdk/pull/24328/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24328&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8348887 Stats: 136 lines in 1 file changed: 136 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24328/head:pull/24328 PR: https://git.openjdk.org/jdk/pull/24328 From duke at openjdk.org Mon Mar 31 14:05:38 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 31 Mar 2025 14:05:38 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v2] In-Reply-To: References: Message-ID: <2GDdehzIlhvP6Q4kz-QQZ4UXVh__NiG7TY54VTIlEgA=.3e1cbcc2-1f36-4dd7-ad0a-4813cb138f61@github.com> > # Issue Summary > > The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. > > # Changes > > This PR makes the following straight forward changes: > - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. > - Add `Or(I|L)` nodes to the IR framework. > - Add a regression IR test for the implemented optimization. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) > - Ran tier1 through tier3 and Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Fix bug number in regression test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24289/files - new: https://git.openjdk.org/jdk/pull/24289/files/af9a3a67..f286eee0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24289&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24289&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24289/head:pull/24289 PR: https://git.openjdk.org/jdk/pull/24289 From epeter at openjdk.org Mon Mar 31 14:05:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 31 Mar 2025 14:05:39 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v2] In-Reply-To: <2GDdehzIlhvP6Q4kz-QQZ4UXVh__NiG7TY54VTIlEgA=.3e1cbcc2-1f36-4dd7-ad0a-4813cb138f61@github.com> References: <2GDdehzIlhvP6Q4kz-QQZ4UXVh__NiG7TY54VTIlEgA=.3e1cbcc2-1f36-4dd7-ad0a-4813cb138f61@github.com> Message-ID: On Mon, 31 Mar 2025 14:02:45 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. >> >> # Changes >> >> This PR makes the following straight forward changes: >> - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. >> - Add `Or(I|L)` nodes to the IR framework. >> - Add a regression IR test for the implemented optimization. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) >> - Ran tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug number in regression test @mhaessig It looks good, just two comments below :) test/hotspot/jtreg/compiler/integerArithmetic/TestOrSaturate.java line 32: > 30: /* > 31: * @test > 32: * @bug 8352839 Suggestion: * @bug 8352893 Werchstabeverbuechslig ;) test/hotspot/jtreg/compiler/integerArithmetic/TestOrSaturate.java line 47: > 45: @Run(test = {"testL", "testI", "testDelayed"}) > 46: public static void check() { > 47: for (int i = 0; i < WARMUP; i++) { Do you actually need the WARMUP here? It does not look like this is a standalone test, so I thought the `@Run` actually gets called many times. Not sure if that is correct... ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24289#pullrequestreview-2729391754 PR Review Comment: https://git.openjdk.org/jdk/pull/24289#discussion_r2021095658 PR Review Comment: https://git.openjdk.org/jdk/pull/24289#discussion_r2021099459 From duke at openjdk.org Mon Mar 31 14:05:39 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 31 Mar 2025 14:05:39 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v2] In-Reply-To: References: <2GDdehzIlhvP6Q4kz-QQZ4UXVh__NiG7TY54VTIlEgA=.3e1cbcc2-1f36-4dd7-ad0a-4813cb138f61@github.com> Message-ID: On Mon, 31 Mar 2025 13:55:08 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bug number in regression test > > test/hotspot/jtreg/compiler/integerArithmetic/TestOrSaturate.java line 32: > >> 30: /* >> 31: * @test >> 32: * @bug 8352839 > > Suggestion: > > * @bug 8352893 > > Werchstabeverbuechslig ;) Fixed in [f286eee](https://github.com/openjdk/jdk/pull/24289/commits/f286eee0cb3f1ee55c4b1903c9d4e019bd9a4e57) > test/hotspot/jtreg/compiler/integerArithmetic/TestOrSaturate.java line 47: > >> 45: @Run(test = {"testL", "testI", "testDelayed"}) >> 46: public static void check() { >> 47: for (int i = 0; i < WARMUP; i++) { > > Do you actually need the WARMUP here? It does not look like this is a standalone test, so I thought the `@Run` actually gets called many times. Not sure if that is correct... The name `WARMUP` is not strictly correct, but I left the loop to test more than one value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24289#discussion_r2021109966 PR Review Comment: https://git.openjdk.org/jdk/pull/24289#discussion_r2021107880 From chagedorn at openjdk.org Mon Mar 31 14:07:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 31 Mar 2025 14:07:26 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 08:57:33 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects > - Verify.Options refactor for Galder > - Update test/hotspot/jtreg/compiler/lib/verify/Verify.java > > Co-authored-by: Galder Zamarre?o > - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects > - clean up test > - JDK-8352869 Nice extensions! Some initial comments. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 25: > 23: > 24: package compiler.lib.verify; > 25: You should update the copyright year. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 37: > 35: /** > 36: * The {@link Verify} class provides {@link Verify#checkEQ}, which recursively compares the two > 37: * {@link Object}s by value. It deconstructs {@link Object[]}, compares boxed primitive types, Can be any object array right? I would instead write: Suggestion: * {@link Object}s by value. It deconstruct an array of objects, compares boxed primitive types, test/hotspot/jtreg/compiler/lib/verify/Verify.java line 43: > 41: * > 42: *

> 43: * When a comparison fail, then methods print helpful messages, before throwing a {@link VerifyException}. Suggestion: * When a comparison fails, then methods print helpful messages, before throwing a {@link VerifyException}. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 46: > 44: * > 45: *

> 46: * We have to take special care of {@link Float}s and {@link Double}s, since they have both various Suggestion: * We have to take special care of {@link Float}s and {@link Double}s, since they both have various test/hotspot/jtreg/compiler/lib/verify/Verify.java line 47: > 45: *

> 46: * We have to take special care of {@link Float}s and {@link Double}s, since they have both various > 47: * encodings for NaN values, but on Java specification they are to be regarded as equal. Hence, we Suggestion: * encodings for NaN values while the Java specification regards them as equal. Hence, we test/hotspot/jtreg/compiler/lib/verify/Verify.java line 68: > 66: > 67: /** > 68: * Generates a {@link Options} with default settings. Suggestion: * Generates an {@link Options} object with default settings. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 86: > 84: * By default, we only support the comparison of a limited set of types, but with this option > 85: * enabled, we can compare arbitrary classes by value, and we compare the Objects by > 86: * the recursive structore given by their field values. Suggestion: * the recursive structure given by their field values. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 209: > 207: print(a, b, field, aParent, bParent); > 208: throw new VerifyException("Object type not supported: " + ca.getName() + " -- did you mean to 'enableCheckWithArbitraryClasses'?"); > 209: } What's the reason behind throwing instead of just comparing two arbitrary objects by default? If a user calls `Verify.checkEQ()` and sees this exception, I would guess he then just passes the additional option and we have the same result. But maybe I'm missing something. ------------- PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2729239101 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2021016321 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2021029237 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2021029655 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2021030792 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2021032586 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2021039554 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2021040835 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2021108161 From thartmann at openjdk.org Mon Mar 31 14:27:08 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 31 Mar 2025 14:27:08 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v2] In-Reply-To: References: <2GDdehzIlhvP6Q4kz-QQZ4UXVh__NiG7TY54VTIlEgA=.3e1cbcc2-1f36-4dd7-ad0a-4813cb138f61@github.com> Message-ID: <1fs73S8KvotH7WKvrwKf2wi5rlsF7g0Lic-YpjANtCs=.9e65ef24-cace-4d63-a072-11566b89f24a@github.com> On Mon, 31 Mar 2025 14:02:05 GMT, Manuel H?ssig wrote: >> test/hotspot/jtreg/compiler/integerArithmetic/TestOrSaturate.java line 47: >> >>> 45: @Run(test = {"testL", "testI", "testDelayed"}) >>> 46: public static void check() { >>> 47: for (int i = 0; i < WARMUP; i++) { >> >> Do you actually need the WARMUP here? It does not look like this is a standalone test, so I thought the `@Run` actually gets called many times. Not sure if that is correct... > > The name `WARMUP` is not strictly correct, but I left the loop to test more than one value. As we discussed offline, I think this should use random values instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24289#discussion_r2021154594 From duke at openjdk.org Mon Mar 31 14:31:43 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 31 Mar 2025 14:31:43 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v3] In-Reply-To: References: Message-ID: > # Issue Summary > > The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. > > # Changes > > This PR makes the following straight forward changes: > - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. > - Add `Or(I|L)` nodes to the IR framework. > - Add a regression IR test for the implemented optimization. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) > - Ran tier1 through tier3 and Oracle internal testing Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Remove loop in test and instead use random values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24289/files - new: https://git.openjdk.org/jdk/pull/24289/files/f286eee0..e02adc43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24289&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24289&range=01-02 Stats: 19 lines in 1 file changed: 3 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24289/head:pull/24289 PR: https://git.openjdk.org/jdk/pull/24289 From duke at openjdk.org Mon Mar 31 14:31:43 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 31 Mar 2025 14:31:43 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v3] In-Reply-To: <1fs73S8KvotH7WKvrwKf2wi5rlsF7g0Lic-YpjANtCs=.9e65ef24-cace-4d63-a072-11566b89f24a@github.com> References: <2GDdehzIlhvP6Q4kz-QQZ4UXVh__NiG7TY54VTIlEgA=.3e1cbcc2-1f36-4dd7-ad0a-4813cb138f61@github.com> <1fs73S8KvotH7WKvrwKf2wi5rlsF7g0Lic-YpjANtCs=.9e65ef24-cace-4d63-a072-11566b89f24a@github.com> Message-ID: <501ulRgHK08Cw84-PRxgp0Ix3WIJ0U68tVurEGYCz60=.4658fe21-964c-41b7-912b-8b9952bebe6d@github.com> On Mon, 31 Mar 2025 14:23:26 GMT, Tobias Hartmann wrote: >> The name `WARMUP` is not strictly correct, but I left the loop to test more than one value. > > As we discussed offline, I think this should use random values instead. Changed to random value in [e02adc4](https://github.com/openjdk/jdk/pull/24289/commits/e02adc43ccfa4c944f646b93d311115cbce2589a) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24289#discussion_r2021166571 From shade at openjdk.org Mon Mar 31 15:40:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 31 Mar 2025 15:40:52 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal Message-ID: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are being removed by #24274, would re-merge after that PR integrates. Additional testing: - [ ] Linux x86_64 server fastdebug, `all` - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/24301/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24301&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353188 Stats: 1073 lines in 11 files changed: 2 ins; 1026 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/24301.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24301/head:pull/24301 PR: https://git.openjdk.org/jdk/pull/24301 From cushon at openjdk.org Mon Mar 31 16:08:57 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Mon, 31 Mar 2025 16:08:57 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v7] In-Reply-To: References: Message-ID: > Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. > > https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. > This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these > transitive uses to the worklist. > > The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for > CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". > We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Explicitly check for OP_Con instead of TypeInteger::is_con. 322 Phi === 303 119 255 [[ 399 388 351 751 366 377 ]] #int:-256..127 !jvms: Integer::parseInt @ bci:151 (line 625) While this Phi dumps as "#int:-256..127", `phase->type(expr)` returns a type that is_con -256. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23871/files - new: https://git.openjdk.org/jdk/pull/23871/files/8554ea87..b064c47b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23871&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23871/head:pull/23871 PR: https://git.openjdk.org/jdk/pull/23871 From cushon at openjdk.org Mon Mar 31 16:08:58 2025 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Mon, 31 Mar 2025 16:08:58 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v6] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 16:18:33 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/ccp/TestAndConZeroCCP.java > > Co-authored-by: Christian Hagedorn >From Matthias --- I was able to reproduce the issue as is by Christian and I have a fix - with some caveats since I only have a partial understanding of what's happening. Here's what I know: In order to simplify EXPR & MASK, AndINode::Value() compares the trailing zero bits of EXPR against the width of MASK. For this, we use phase->type(expr)->is_con(). What I observe for the Integer.parseInt reproducer is that `expr` _dumps_ as a phi node with type #int:-256...127, but phase->type(expr) returns a type that is_con() with value -256. In consequence, the AND(phi-node, mask) gets optimized to zero. Concretely, my understanding is that the node is "digit" here: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Integer.java#L618,L622,L627 618 int digit = ~0xFF; lo = -256 622 ..if.. { digit = digit(firstChar, radix); .. } assumes Latin1 => byte => hi = 127 from here: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template#L233 627: int result = -(digit & 0xFF); I can only guess why we'd get an is_con() type for this node, I assume it's a speculative optimization, but I would expect that to happen all the time. Why reducing this phi node to zero would cause an issue, though, I'm out of my depth there. Anyway, my first instinct is to replace the constant check in the optimization `phase->type(expr)->is_con()` with an explicit opcode check (== OP_ConI || OP_ConL). This a) fixes the crash, b) passes all tests we added/changed the new optimization, so it doesn't undo anything we were trying to accomplish in the first place. I've pushed a corresponding commit, ptal: - if (type->is_con()) { + if (expr->Opcode() == Op_ConI || expr->Opcode() == Op_ConL) { It does feel like it's addressing a symptom though, not a cause. That being said, the "And[IL]Node::Value" optimization for "const & mask => 0" has always been a "happy byproduct", the actual goal was always the optimization in "::Ideal": "((expr + const) << shift) & mask => expr & mask", which still works. LMK what you think. Unfortunately, I have been unable to reproduce this outside of jl.Integer.parseInt. Even when copying jl.Integer.parseInt to my own method, I wasn't able to trigger the crash. Matthias ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2766699676 From shade at openjdk.org Mon Mar 31 18:46:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 31 Mar 2025 18:46:53 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v2] In-Reply-To: References: Message-ID: <2HQ4RI4tsr1vs81DbkYw7J7omhy1EnEoatZENNTttpg=.243a25be-71eb-486b-8c04-29295bcea9b9@github.com> > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Shared utility class for method unload blocking - Merge branch 'master' into JDK-8231269-compile-task-weaks - JNIHandles -> VM(Weak) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/cc8345e9..d965fef3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=00-01 Stats: 67228 lines in 1914 files changed: 19554 ins; 40216 del; 7458 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Mon Mar 31 18:46:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 31 Mar 2025 18:46:53 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v2] In-Reply-To: <4ZbIg2yTtJjQUwkCjO_Klnv0e4_DLNaRzxxpJa4g9RU=.9f32f9f2-b50c-495d-8188-3207a061e7b3@github.com> References: <4ZbIg2yTtJjQUwkCjO_Klnv0e4_DLNaRzxxpJa4g9RU=.9f32f9f2-b50c-495d-8188-3207a061e7b3@github.com> Message-ID: <7agnCAcIk7-kFFHM4-gXSaOYsMwXY9xdY3hpOtOyoIs=.02b3766c-c9b1-48df-9fcc-53ca892dd230@github.com> On Fri, 28 Mar 2025 22:01:43 GMT, Vladimir Ivanov wrote: > What do you think about making 1 step further and encapsulating weak/strong reference handling into a helper class? Yes. I think @veresov would want to have some of this for persistent profiles JEP and `TrainingData`. I pushed the WIP thing into PR. That only covers the "method unload blocker" part. But I think it should _really_ go further, and encapsulate `Method*` completely, since it is not safe to touch `Method*` when its holder is not blocked for unload. We dodge the problems now by obsessively checking `is_unloading()` all over the place, but we need to guarantee this more mechanically. I'll take a look at that tomorrow. > Also, as an optimization idea: seems like weak + strong handles form a union (none -> weak -> strong). So, once a strong reference is captured, corresponding weak handle can be cleared straight away. It turns out to be necessary to avoid touching `peek()` when in the wrong thread state, when we encapsulate `Method*` as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2767098568 From cslucas at openjdk.org Mon Mar 31 21:22:17 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 31 Mar 2025 21:22:17 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. @vnkozlov - if you could also please take a look as the original author of the touched code. Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24298#issuecomment-2767425077 From vlivanov at openjdk.org Mon Mar 31 21:27:19 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 31 Mar 2025 21:27:19 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal In-Reply-To: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: On Fri, 28 Mar 2025 17:11:14 GMT, Aleksey Shipilev wrote: > Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` Looks good. src/hotspot/cpu/x86/c1_FrameMap_x86.cpp line 45: > 43: Register reg = r_1->as_Register(); > 44: if (r_2->is_Register() && (type == T_LONG || type == T_DOUBLE)) { > 45: Register reg2 = r_2->as_Register(); FTR `reg2` is unused. (Moreover, `r_2` and `r_2->is_Register()` are redundant on x64.) src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 827: > 825: // compressed klass ptrs: T_METADATA can be a compressed klass > 826: // ptr or a 64 bit method pointer. > 827: ShouldNotReachHere(); Alternatively, you could drop the whole `T_METADATA` case and defer the handling to default case. src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3063: > 3061: ExternalAddress((address)double_signflip_pool), > 3062: rscratch1); > 3063: Is it intentional or just a leftover? ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24301#pullrequestreview-2730522710 PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2021774792 PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2021778835 PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2021781614 From vlivanov at openjdk.org Mon Mar 31 22:03:38 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 31 Mar 2025 22:03:38 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 10:38:21 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23468#pullrequestreview-2730617282 From vlivanov at openjdk.org Mon Mar 31 22:03:38 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 31 Mar 2025 22:03:38 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 10:08:54 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/node.cpp line 3134: >> >>> 3132: size_t len = ss.size() + 1; >>> 3133: char* arena_str = NEW_ARENA_ARRAY(igvn->C->comp_arena(), char, len); >>> 3134: memcpy(arena_str, ss.base(), len); >> >> Does it make sense to move it into `stringStream::as_string()`? `stringStream::as_string()` already handles resource area and C-heap allocations. > > It does make sense. Implemented in new commit. I added a new method and there's some code duplication but it felt better than adding one more optional argument to the existing method. What do you think? Looks good! I definitely prefer overload-based solution you came up with. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2021833681 From vlivanov at openjdk.org Mon Mar 31 22:31:26 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 31 Mar 2025 22:31:26 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References: Message-ID: On Sun, 30 Mar 2025 03:14:32 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > add missing import Thanks. > The naming of that method evolved during the course of the review of this PR. I believe the thinking was that the check was not necessarily an overall upper bound, and a simpler name would imply it was more general. There's usually a lot of invariants a function assumes and it's simply impractical to encode everything in the name. Speaking of this particular case (`calc_xor_upper_bound_of_non_neg`): * `calc_` is redundant and IMO only adds noise; * `_non_neg` part is confusing; I'd stress instead that it works on **ranges**. So, `xor_upper_bound_for_ranges` then? (And, please, explain in the comment what's the correspondense between `S` and `U` template type parameters.) > `addnodeXorUtil.hpp` I'm fine with placing it under `opto`. Please, rename the file into `src/hotspot/share/opto/utilities/xor.hpp`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2767559589 From vlivanov at openjdk.org Mon Mar 31 23:43:31 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 31 Mar 2025 23:43:31 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v2] In-Reply-To: <2HQ4RI4tsr1vs81DbkYw7J7omhy1EnEoatZENNTttpg=.243a25be-71eb-486b-8c04-29295bcea9b9@github.com> References: <2HQ4RI4tsr1vs81DbkYw7J7omhy1EnEoatZENNTttpg=.243a25be-71eb-486b-8c04-29295bcea9b9@github.com> Message-ID: On Mon, 31 Mar 2025 18:46:53 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Shared utility class for method unload blocking > - Merge branch 'master' into JDK-8231269-compile-task-weaks > - JNIHandles -> VM(Weak) Nice! I really like how it shapes out. src/hotspot/share/runtime/methodUnloadBlocker.inline.hpp line 72: > 70: assert(!is_unloaded(), "Pre-condition: should not be unloaded"); > 71: > 72: if (!_weak_handle.is_empty()) { Does the precondition imply that `!_weak_handle.is_empty()` always hold? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2767653397 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2021915508